THE PRIMES CONTAIN ARBITRARILY LONG POLYNOMIAL 

PROGRESSIONS 
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Abstract. We establish the existence of infinitely many polynomial pro- 
gressions in the primes; more precisely, given any integer- valued polynomials 
Pi, . . . , Pk S Z[m] in one unknown m with Pi(0) = . . . = Pfc(O) = and any 
e > 0, we show that there are infinitely many integers x, m with 1 < m < x E 
such that x + Pi (m) , . . . , x + Pfc (m) are simultaneously prime. The arguments 
are based on those in 1181 . which treated the linear case Pj = (i — l)m and 
e = 1; the main new features are a localization of the shift parameters (and 
the attendant Gowers norm objects) to both coarse and fine scales, the use 
of PET induction to linearize the polynomial averaging, and some elemen- 
tary estimates for the number of points over finite fields in certain algebraic 
varieties. 
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1. Introduction 

In 1975, Szemeredi |31j proved that any subset A of integers of positive upper 
density limsup^^^ ^4fwr > contains arbitrarily long arithmetic progressions. 
Throughout this paper [TV] denotes the discrete interval [N] := {1, . . . , N}, and \X\ 
denotes the cardinality of a finite set X. Shortly afterwards, Furstenberg JTH] gave 
an ergodic-theory proof of Szemeredi's theorem. Furstenberg observed that ques- 
tions about configurations in subsets of positive density in the integers correspond 
to recurrence questions for sets of positive measure in a probability measure pre- 
serving system. This observation is now known as the Furstenberg correspondence 
principle. 

In 1978, Sarkozy I2H] 1 (using the Hardy-Littlewood circle method) and Furstenberg 
jllj (using the correspondence principle, and ergodic theoretic methods) proved 
independently that for any polynomial 2 P G Z[m] with P(0) = 0, any set A C Z 
of positive density contains a pair of points x, y with difference y — x = P(m) for 
some positive integer m > 1. In 1996 Bergelson and Leibman (H] proved, by purely 
ergodic theoretic means 3 , a vast generalization of the Fustenberg-Sarkozy theorem 
- establishing the existence of arbitrarily long polynomial progressions in sets of 
positive density in the integers. 

1 Sarkozy actually proved a stronger theorem for the polynomial P = m 2 providing an upper 
bound for density of a set A for which A — A does not contain a perfect square. His estimate was 
later improved by Pintz, Steiger, and Szemeredi in |25|. and then generalized in |2] for P = m fe 
and then 1301 for arbitrary P with P(0) = 0. 

2 We use Z[m] to denote the space of polynomials of one variable m with integer- valued coef- 
ficients; see Section|5jfor further notation along these lines. 

■^Unlike Szemeredi's theorem or Sarkozy's theorem, no non-ergodic proof of the Bergelson- 
Leibman theorem in its full generality is currently known. However, in this direction Green [171 
has shown by Fourier-analytic methods that any set of integers of positive density contains a triple 
{x, x + n, x + 2n} where n is a non-zero sum of two squares. 
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Theorem 1.1 (Polynomial Szemeredi theorem). (H] Let A C Z be a set of positive 
upper density, i.e. limsup^r^^ ^TUyjp > 0- Then given any integer-valued poly- 
nomials Pi, . . . ,Pfc € Z[m] m one unknown m uni/i Pi(0) = ... = Pfe(0) = 0, A 
contains infinitely many progressions of the form x + Pi (to), ... ,X + Pfe(m) wwift 
to > 0. 

Remark 1.2. By shifting x appropriately, one may assume without loss of generality 
that one of the polynomials Pi vanishes, e.g. Pi = 0. We shall rely on this ability to 
normalize one polynomial of our choosing to be zero at several points in the proof, 
most notably in the "PET induction" step in Section !?). 101 The arguments in jH] 
also establish a generalization of this theorem to higher dimensions, which will be 
important to us to obtain a certain uniformly quantitative version of this theorem 
later (see Theorem 13. 21 and Appendix lB|l . 

The ergodic theoretic methods, to this day, have the limitation of only being able 
to handle sets of positive density in the integers, although this density is allowed to 
be arbitrarily small. However in 2004, Green and Tao [18 discovered a transference 
principle which allowed one (at least in principle) to reduce questions about con- 
figurations in special sets of zero density (such as the primes V := {2, 3, 5,7,.. .}) 
to questions about sets of positive density in the integers. This opened the door to 
transferring the Szemeredi type results which are known for sets of positive upper 
density in the integers to the prime numbers. Applying this transference princi- 
ple to Szemeredi's theorem, Green and Tao showed that there are arbitrarily long 
arithmetic progressions in the prime numbers 4 . 

In this paper we prove a transference principle for polynomial configurations, which 
then allows us to use (a uniformly quantitative version of) the Bergelson-Leibman 
theorem to prove the existence of arbitrarily long polynomial progressions in the 
primes, or more generally in large subsets of the primes. More precisely, the main 
result of this paper is the following. 

Theorem 1.3 (Polynomial Szemeredi theorem for the primes). Let A C V be a set 

of primes of positive relative upper density in the primes, i.e. limsupjy^^ > 
0. Then given any Pi, . . . ,Pfc € Z[m] with Pi(0) = ... = Pfc(0) = 0, A contains 
infinitely many progressions of the form x + Pi (to), ... ,x + P/.(m) with to > 0. 

Remarks 1.4. The main result of |18| corresponds to Theorem 1 1.31 in the linear case 
Pi := (i — l)m. The case k — 2 of this theorem follows from the results of j2H]> [2]; 
|3(J| , which in fact address arbitrary sets of integers with logarithmic type sparsity. 
As a by-product of our proof, we shall also be able to impose the bound to < x e 
for any fixed e > 0, and thus (by diagonalization) that to = a; ' 1 '; see Remark 
12.41 Our results for the case A = V are consistent with what is predicted by the 
Bateman-Horn conjecture 0, which remains totally open in general (though see 
|2()j for some partial progress in the linear case). 



^Shortly afterwards, the transference principle was also combined in 1361 with the multidimen- 
sional Szemeredi theorem II 21 (or more precisely a hypergraph lemma related to this theorem, 
see I35l ~) to establish arbitrarily shaped constellations in the Gaussian primes. A much simpler 
transference principle is also available for dense subsets of genuinely random sparse sets; see 1371 . 
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Remark 1.5. In view of the generalization of Theorcm ll.ll to higher dimensions in j^] 
it is reasonable to conjecture that an analogous result to Theorem 1 1 . 31 also holds in 
higher dimensions, thus any subset of V d of positive relative upper density should 
contain infinitely many polynomial constellations, for any choice of polynomials 
which vanish at the origin. This is however still open even in the linear case, the 
key difficulty being that the tensor product of pseudorandom measures is not pseu- 
dorandom. In view of J32] however, it should be possible (though time-consuming) 
to obtain a counterpart to Theorem II. 31 for the Gaussian primes. 

The philosophy of the proof is similar to the one in ^H]. The first key idea is to 
think of the primes (or any large subset thereof) as a set of positive relative den- 
sity in the set of almost primes, which (after some application of sieve theory, as 
in the work of Goldston and Yildirim ^3]) can be shown to exhibit a somewhat 
pseudorandom behavior. Actually, for technical reasons it is more convenient to 
work not with the sets of primes and almost primes, but rather with certain nor- 
malized weight functions < / < v which are 5 supported (or concentrated) on the 
primes and almost primes respectively, with v obeying certain pseudorandom mea- 
sure properties. The functions /, v are unbounded, but have bounded expectation 
(mean). A major step in the argument is then the establishment of a Koopman-von 
Neumann type structure theorem which decomposes / (outside of a small excep- 
tional set) as a sum / = fy± + fu, where fjjx is a non- negative bounded function 
with large expectation, and fu is an error which is unbounded but is so uniform 
(in a Gowers-type sense) that it has a negligible impact on the (weighted) count 
of polynomial progressions. The remaining component f v ± of /, being bounded, 
non-negative, and of large mean, can then handled by (a quantitative version of) 
the Bergelson-Leibman theorem. 

Remark 1.6. As remarked in |18j . the above transference arguments can be cat- 
egorized as a kind of "finitary" ergodic theory. In the language of traditional 
(infinitary) ergodic theory, fu± is analogous to a conditional expectation of / rela- 
tive to a suitable characteristic factor for the polynomial average being considered. 
Based on this analogy, and on the description of this characteristic factor in terms 
of nilsystems (see [HI); one would hope that f v ± could be constructed out of 
nilsequences. In the case of linear averages this correspondence has already some 
roots in reality; see ^Hj • In the special case A — V one can then hope to use analytic 
number theory methods to show that fjj± is essentially constant, which would lead 
to a more precise version of Theorem ^3] in which one obtains a precise asymptotic 
for the number of polynomial progressions in the primes, with x and n confined to 
various ranges. In the case of progressions of length 4 (or for more general linear 
patterns, assuming certain unproven conjectures), such an asymptotic was already 
established in [201 ■ While we expect similar asymptotics to hold for polynomial 
progressions, we do not pursue this interesting question here 7 . 

^This is an oversimplification, ignoring the "W-trick" necessary to eliminate local obstructions 
to uniformity; see SectionEJfor full details. 

^The term measure is a bit misleading. It is better to think of v as the Radon-Nikodym 
derivative of a measure. Still, we stick to this terminology so as not to confuse the reader who is 
familiar with |18l 

7 One fundamental new difficulty that arises in the polynomial case is that it seems that one 
needs to control short correlation sums between primes and nilsequences, such as on intervals of 
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As we have already mentioned, the proof of Theorem 11.31 closely follows the argu- 
ments in |18j . However, some significant new difficulties arise when adapting those 
arguments 8 to the polynomial setting. The most fundamental such difficulty arises 
in one of the very first steps of the argument in |18| . in which one localizes the 
pattern x + P\{m), . . . , x + Pk{m) to a finite interval [N] — {1, . . . , TV}. In the 
linear case Pi = (i — l)m this localization restricts both x and m to be of size 
O(N). However, in the polynomial case, while the base point x is still restricted to 
size O(N), the shift parameter m is now restricted to a much smaller range O(M), 
where M :— N Vo and < ?/o < 1 is a small constant depending on Pi, . . . ,P& (one 
can take for instance rjo := l/2d, where d is the largest degree of the polynomials 
Pi , . . . , Pk ; by taking a little more care one can increase this to 770 = 1/d) . This 
eventually forces us to deal with localized averages of the form 9 

(1) E me[M] / T p ^f...T p ^f 

where X :— Zn '■= Z/NZ is the cyclic group with N elements, / : X — ► R + is a 
weight function associated to the set A, and Tg(x) :— g(x — 1) is the shift operator 
on X. Here we use the ergodic theory-like 10 notation 

E neY F(n) := ^ E F ^ 

' ' n£Y 

for any finite non-empty set Y, and 

(2) / /:=E x6Jf /(x) = 

We shall normalize / to have mean J x f = 773 and will also have the pointwise bound 
< / < v for some "pseudorandom measure" v associated to the almost primes 
at a sieve level R := N m for some 11 < 773 -C 772 *C 770 (so M is asymptotically 
larger than any fixed power of R). The functions /, v will be defined formally in 
(|ll|l and (|72|l respectively, but for now let us simply remark that we will have the 

the form [a:, x + x e ], instead of the long correlation sums (such as on [x, 2x]) which appear in the 
linear theory. Even assuming strong conjectures such as GRH, it is not clear how to obtain such 
control. 

8 If the measure v for the almost primes enjoyed infinitely many pseudorandomness conditions 
then one could adapt the arguments in 1371 to obtain Theorem II . 31 rather quickly. Unfortunately, 
in order for / to have non-zero mean, one needs to select a moderately large sieve level R = N V2 
for the measure v, which means that one can only impose finitely many (though arbitarily large) 
such pseudorandomness conditions on v. This necessitates the use of the (lengthier) arguments 
in [H| rather than 1371 . 

9 This is an oversimplification as we are ignoring the need to first invoke the "VK-trick" to 
eliminate local obstructions from small moduli and thus ensure that the almost primes behave 
pseudorandomly. See Theorem 12. 3l for the precise claim we need. 

10 Traditional ergodic theory would deal with the case where the underlying measure space 
Zjv is infinite and the shift range M is going to infinity, thus informally N = 00 and M — » 00. 
Unraveling the Furstenberg correspondence principle, this is equivalent to the setting where N is 
finite (but going to infinity) and M = w(N) is a very slowly growing function of N. In 1181 one is 
instead working in the regime where M = N are going to infinity at the same rate. The situation 
here is thus an intermediate regime where M = N^o goes to infinity at a polynomially slower rate 
than N. In the linear setting, all of these regimes can be equated using the random dilation trick 
of Varnavides 39], but this trick is only available in the polynomial setting if one moves to higher 
dimensions, see Appendix 1151 

"^The "missing" values of r), such as iji, will be described more fully in Section l2l 
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bound j x v = 1 + o(l), together with many higher order correlation estimates on 



Jx 

v. 



Let us defer the (sieve-theoretic) discussion of the pseudorandomness of v for the 
moment, and focus instead on the (finitary) ergodic theory components of the 
argument. If we were in the linear regime M = N used in |18j (with N assumed 
prime for simplicity), then repeated applications of the Cauchy-Schwarz inequality 
(using the PET induction method) would eventually let us control the average JTJ 
in terms of Gowers uniformity norms such as 



\f\\u*(z N ) := [ E *co),m( 1 )e[JV] ti ^ H T™ 1 + - +rrid f 



l/2 d 



X rf(0,l} d 

for some sufficiently large d (depending on Pi , . . . , P& ; eventually they will be 
of size 0(1/771) for some 772 <C 771 <C 770), where u> = (oj\, . . . and = 
(rrii , . . . , rrig ) for i = 0, 1. If instead we were in the pseudo-infinitary regime 
M = M (N) for some slowly growing function M : Z + — > Z + , repeated applications 
of the van der Corput lemma and PET induction would allow one to control these 
averages by the Gowers- Host-Kra seminorms \\f\\d from [22] . which in our finitary 
setting would be something like 

l/2 d 

ll/lld : = I ,m,( 1 )e[M 1 ]x...x[M d ] / Y\. + '" +m<i / 

\ JX w£{0,l} d 

where M\ , . . . , Md are slowly growing functions of N which we shall deliberately 
keep unspecified 12 . In our intermediate setting M = A™, however, neither of these 
two quantities seem to be exactly appropriate. Instead, after applying the van der 
Corput lemma and PET induction one ends up considering an averaged localized 
Gowers norm of the form 13 



l/2 d 



II f || , , .- F, ^- [ TT T Q 1 {h) m 'f l) + ... + Q d {h) m ( ^ d) f 

^ \ Jx uje{Q,i} d 

where H = N nr is a small power of N (much smaller than M or R) , t is a natural 
number depending only on Pi, . . . , Pd (and of size 0(1/771)), and Q\, . . . , Qd € 
Z[hi, . . . , hi] are certain polynomials (of t variables hi, ... , h<) which depend on 
Pi, . . . , Pd- Indeed, we will eventually be able (see Theorem I4.5[) to establish a 
polynomial analogue of the generalized von Neumann theorem in |18| , which roughly 
speaking will assert that (if v is sufficiently pseudorandom) any component of / 
which is "locally Gowers-uniform" in the sense that the above norm is small, and 
which is bounded pointwise by 0(y + 1), will have a negligible impact on the 
average Q. To exploit this fact, we shall essentially repeat the arguments in |T8] 
(with some notational changes to deal with the presence of the polynomials Qi 
and the short shift ranges) to establish (assuming v is sufficiently pseudorandom) 



12 In the traditional ergodic setting N = 00, M — > 00, one would take multiple limit superiors 
as Mi , . . . , Md — > 00 , choosing the order in which these parameters go to infinity carefully; see 

1351 . 

13 A 

gain, this is a slight oversimplification as we arc ignoring the effects of the "W-trick" . 
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an analogue of the Koopman-von Neumann-type structure theorem in that paper, 
namely a decomposition / = f v ± + fu (outside of a small exceptional set) where 
fu± is bounded by 0(1), is non-negative and has mean roughly 5, and fu is locally 
Gowers- uniform and thus has a negligible impact on Q). Combining this with a 
suitable quantitative version fTheorem l3.2|) of the Bergelson-Leibman theorem one 
can then conclude Theorem 1 1.31 

We have not yet discussed how one constructs the measure v and establishes the 
required pseudorandomness properties. We shall construct v as a truncated divisor 
sum at level R — N 712 , although instead of using the Goldston-Yildirim divisor sum 
as in [2], JHj we shall use a smoother truncation (as in [22] |21j . |2()| ) as it is 
slightly easier to estimate 14 . The pseudorandomness conditions then reduce, after 
standard sieve theory manipulations, to the entirely local problem of understanding 
the pseudorandomness of the functions A p : F p — > R + on finite fields F p , defined 
for all primes p by A p (x) := when x ^ modp and A p (x) — otherwise. Our 
pseudorandomness conditions shall involve polynomials, and so one is soon faced 
with the standard arithmetic problem of counting the number of points over F p 
of an algebraic variety. Fortunately, the polynomials that we shall encounter will 
be linear in one or more of the variables of interest, which allows us to obtain a 
satisfactory count of these points without requiring deeper tools from arithmetic 
such as class field theory or the Weil conjectures. 

1.7. Acknowledgements. The authors thank Brian Conrad for valuable discus- 
sions concerning algebraic varieties, Peter Sarnak for encouragement, Ben Green 
for help with the references, and Elon Lindcnstrauss, Akshay Venkatesh and Lior 
Silberman for helpful conversations. 



2. Notation and initial preparation 

In this section we shall fix some important notation, conventions, and assumptions 
which will then be used throughout the proof of Theorem 11.31 Indeed, all of the 
sub-theorems and lemmas used to prove Theorem 11.31 will be understood to use 
the conventions and assumptions in this section. We thus recommend that the 
reader go through this section carefully before moving on to the other sections of 
the paper. 

Throughout this paper we fix the set A C V and the polynomials P\ , . . . , Pk E Z [m] 
appearing in Theorem 11.31 Henceforth we shall assume that the polynomials are 
all distinct, since duplicate polynomials clearly have no impact on the conclusion 
of Theorem ll.3l Since we are also assuming Pi(0) — for all i, we conclude that 

(3) Pi — Pi' is non-constant for all 1 < i < i' < k. 



14 However, in contrast to the arguments in 1211 . 1201 we will not be able to completely localize 
the estimations on the Riemann-zeta function £ (s) to a neighborhood of the pole s = 1, for rather 
minor technical reasons, and so will continue to need the classical estimates I1U8I on near 
the line s = 1 + it. 
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By hypothesis, the upper density 

\A n [N 1 ] 



Sq := limsup • 



n' — >oo \vn[N']\ 

is strictly positive. We shall allow all implied constants to depend on the quantities 
5 ,P u ...,P k . 

By the prime number theorem 

(4) |Pn[Ar']| = ( i + o( i))J^ 

we can find an infinite sequence of integers N' going to infinity such that 

(5) \An[N']\ > ls - A ' 



2 log TV' 

Henceforth the parameter N' is always understood to obey 10. 

We shall need eight (!) small quantities 

< V7 < Ve < % < ?74 < % < V2 < r?i < »7o « 1 

which depend on 5o and on Pi, . . . , Pk- All of the assertions in this paper shall be 
made under the implicit assumption that 770 is sufficiently small depending on So, 
Pi, . . ., Pk', that 771 is sufficiently small depending on So, Pi, . . ., Pk, rjo; and so 
forth down to 777, which is assumed sufficiently small depending on So, Pi, ■ ■ ■, Pfc, 
rjo, ■ ■ ■, t]o and should thus be viewed as being extremely microscopic in size. For 
the convenience of the reader we briefly and informally describe the purpose of each 
of the rji, their approximate size, and the importance of being that size, as follows. 



• The parameter 770 controls the coarse scale M :— N^ . It can be set equal 
to l/2d, where d is the largest degree of the polynomials Pi,...,Pk- If 
one desires the quantity m in Theorem II. 31 to be smaller than x e then one 
can achieve this by choosing ry to be less than e. The smallness of r\o is 
necessary in order to deduce Theorem 1 1 . 31 from Theorem 12.31 below. 

• The parameter r\i (or more precisely its reciprocal 1/rji) controls the degree 
of pseudorandomness needed on a certain measure v to appear later. Due 
to the highly recursive nature of the "PET induction" step fSection lS.lOfl . it 
will need to be rather small; it is essentially the reciprocal of an Ackermann 
function of the maximum degree d and the number of polynomials k. The 
smallness of 771 is needed in order to estimate all the correlations of v which 
arise in the proofs of Theorem 14.51 and Theorem 14.71 

• The parameter 772 controls the sieve level R := N m . It can be taken to be 
crji/d for some small absolute constant c > 0. It needs to be small relative 
to rji in order that the inradius bound of Proposition HTm is satisfied. 

• The parameter 773 measures the density of the function /. It is basically of 
the form cSor]2 for some small absolute constant c > 0. It needs to be small 
relative to 772 in order to establish the mean bound Q12JI. 

• The parameter 774 measures the degree of accuracy required in the Koopman- 
von Neumann type structure theorem (Theorem 14. 7JI . It needs to be sub- 
stantially smaller than 7/3 to make the proof of Theorem 13. 161 in Section ED 
work. The exact dependence on 773 involves the quantitative bounds arising 
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from the Bergelson-Leibman theorem (see Theorem 13. 2|) . In particular, as 
the only known proof of this theorem is infinitary, no explicit bounds for 
774 in terms of 773 are currently available. 

• The parameter 775 controls the permissible error allowed when approximat- 
ing indicator functions by a smoother object, such as a polynomial; it needs 
to be small relative to 774 in order to make the proof of the abstract struc- 
ture theorem fTheorem l7.1|) work correctly. It can probably be taken to be 
roughly of the form exp(— C/rjf) for some absolute constant C > 0, though 
we do not attempt to make 775 explicit here. 

• The parameter 776 (or more precisely controls the maximum degree, 
dimension, and number of the polynomials that are encountered in the 
argument. It needs to be small relative to 775 in order for the polynomials 
arising in the proof of Proposition l7.4l to obey the orthogonality hypothesis 
()49f) of Theorem 17.11 It can in principle be computed in terms of 7/5 by 
using a sufficiently quantitative version of the Weierstrass approximation 
theorem, though we do not do so here. 

• The parameter 777 controls the fine scale H := N V7 , which arises during 
the "van der Corput" stage of the proof in Section 15.101 It needs to be 
small relative to 776 in order that the "clearing denominators" step in the 
proof of Proposition 16.51 works correctly. It is probably safe to take 777 to 
be 77g 00 although we shall not explicitly do this. On the other hand, 777 
cannot vanish entirely, due to the need to average out the influence of "bad 
primes" in Corollary 111.21 and Thcorcm ll2.ll 

It is crucial to the argument that the parameters are ordered in exactly the above 
way. In particular, the fine scale H = N V7 needs to be much smaller than the 
coarse scale M = N Vo . 

We use the following asymptotic notation: 

• We use X = 0(Y), X < Y, or Y > X to denote the estimate \X\ < CY for 
some quantity < C < 00 which can depend on So , P% , . . . , Pfc. If we need 
C to also depend on additional parameters we denote this by subscripts, 
e.g. X — Ok{Y) means that \X\ < CrY for some Ck depending on 
8 ,Pi,...,P k ,K, 

• We use X = o(Y) to denote the estimate \X\ < c(N')Y, where c is a 
quantity depending on So, P%, . . . , P/., 770, . . . , 777, N' which goes to zero as 
N' — > 00 for each fixed choice of So, Pi, . . . , Pk, rjo, . . . , 777. If we need 
c(A r ') to depend on additional parameters we denote this by subscripts, 
e.g. X — Ok(Y) denotes the estimate \X\ < ck{N')Y, where ck(N') 
is a quantity which goes to zero as N' — > 00 for each fixed choice of 
80, Pi, ■ ■ ■ ,Pk,Vo> ■ ■ -,V7,K. 

We shall implicitly assume throughout that N' is sufficiently large depending on 
80, Pi, ... , Pfc, 770, ... , 777; in particular, all quantities of the form o(l) will be small. 

Next, we perform the 'W-trick" from |18| to eliminate obstructions to uniformity 
arising from small moduli. We shall need a slowly growing function w = w(N') of 
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N' . For sake of concreteness 15 we shall set 

w := — log log log N'. 
We then define the quantity W by 

W:= l[ P 

and the integer 16 N by 

N' 

(6) 2V:=| 1. 

v ; l 2W j 

Here and in the sequel, all products over p are understood to range over primes, 
and [x\ is the greatest integer less than or equal to x. The quantity W will be 
used to eliminate the local obstructions to pseudorandomness arising from small 
prime moduli; one can think of W (or more precisely the cyclic group Zw) as the 
finitary counterpart of the "profinite factor" generated by the periodic functions in 
infinitary ergodic theory. From the prime number theorem Q one sees that 

(7) W = O(loglogiV) 
and 

(8) N' = N 1+ °W. 

In particular the asymptotic limit N' — > oo is equivalent to the asymptotic limit 
N — > oo for the purposes of the o() notation, and so we shall now treat N as the 
underlying asymptotic parameter instead of N' . 



From ©, 0, ® we have 



1 N 
\AD[-WN]\[w]\ > W- 



L 2 J u Jl logiV 

(recall that implied constants can depend on So). On the other hand, since A 
consists entirely of primes, all the elements in are coprime to W. By the 

pigeonhole principle 17 , we may thus find b = b(N) e [W] coprime to W such that 

1 W N 

(9) \{xe [-N] :Wx + be 



2 J ' Jl <f>{W) logiV 

where 4>{W) = n p <u,(P — ■*-) 1S ^ ne Euler totient function of W, i.e. the number of 
elements of [W] which are coprime to W. 



^Actually, the arguments here work for any choice of function w : Z + — > Z+ which is bounded 
by jq log log log N' and which goes to infinity as JV' — > oo. This is important if one wants an 
explicit lower bound on the number of polynomial progressions in a certain range. 

16 Unlike previous work such as 1181 . we will not need to assume that N is prime (which is the 
finitary equivalent of the underlying space X being totally ergodic), although it would not be hard 
to ensure that this were the case if desired. This is ultimately because we shall clear denominators 
as soon as they threaten to occur, and so there will be no need to perform division in X = Zjy. 
On the other hand, this clearing of denominators will mean that many (fine) multiplicative factors 
such as Q(h) shall attach themselves to the (coarse-scale) shifts one is averaging over. In any case, 
the "W-trick" of passing from the integers Z to a residue class W ■ Z + b can already be viewed 
as a kind of reduction to the totally ergodic setting, as it eliminates the effects of small periods. 

17 In the case A = V, we may use the prime number theorem in arithmetic progressions (or 
the Siegel-Walfisz theorem) to choose b, for instance to set 6 = 1. However, we will not need to 
exploit this ability to fix b here. 
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Let us fix this b. We introduce the underlying measure space X := Zn — Z/NZ, 
with the uniform probability measure given by We also introduce the coarse 
scale M := N™, the sieve level R := N 7 * 2 , and the fine scale H := N V7 . It will be 
important to observe the following size hierarchy: 

(10) 1 « W < W 1,m < H < H 1/m < R < R 1/Vl < M < N = \X\. 

Indeed each quantity on this hierarchy is larger than any bounded power of the 
preceding quantity, for suitable choices of the 77 parameters, for instance i? ' 1 /'' 1 ) < 
M 1 / 4 . 

Remark 2.1. In the linear case ^H] we have M — N, while the parameter H is not 
present (or can be thought of as O(l)). We shall informally refer to parameters 
of size 18 O(M) as coarse-scale parameters, and parameters of size H as fine-scale 
parameters] we shall use the symbol m to denote coarse-scale parameters and h 
for fine-scale parameters (reserving x for elements of X). Note that because the 
sieve level R is intermediate between these parameters, we will be able to easily 
average the pseudorandom measure v over coarse-scale parameters, but not over 
fine parameters. Fortunately, our averages will always involve at least one coarse- 
scale parameter, and after performing the coarse-scale averages first we will have 
enough control on main terms and error terms to then perform the fine averages. 
The need to keep the fine parameters short arises because at one key "Weierstrass 
approximation" stage to the argument, we shall need to control the product of an 
extremely large number (about 0(l/i]e), in fact) of averages (or more precisely 
"dual functions"), and this will cause many fine parameters to be multiplied to- 
gether in order to clear denominators. This is still tolerable because H remains 
smaller than R,M,N even after being raised to a power 0{\/rjo). Note it is key 
here that the number of powers 0(l/rjQ) does not depend on 777. It will therefore be 
important to keep large parts of our argument uniform in the choice 777, although 
we can and will allow 777 to influence o(l) error terms. The quantity H (and thus 
r]j) will not actually make an impact on the argument until Section 0] when the 
local Gowers norms are introduced. 



We define the standard shift operator T : X — > X on X by Tx := x + 1, with the 
associated action on functions g : X — > R by Tg := goT^ 1 , thus T n g(x) — g(x — n) 
for any n S Z. We introduce the normalized counting function / : X — » R + by 
setting 

<b(W) 1 
(11) f(x) := y w ' logR whenever x <E [-N] and Wx + b e A 

and f(x) — otherwise, where we identify [|iV] with a subset of Zjv in the usual 
manner. The use of logi? instead of logiV as a normalizing factor is necessary 
in order to bound / pointwise by the pseudorandom measure v which we shall 
encounter in later sections; the ratio 772 between logi? and logA^ represents the 
relative density between the primes and the almost primes. Observe from that 
/ has relatively large mean: 

/ 

J x 



18 Later on we shall also encounter some parameters of size 0(\/~M) or 0(M 1 / 4 '), which we 
shall also consider to be coarse-scale. 
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In particular we have 



(12) 




Remark 2.2. We will eventually need to take 772 (and hence 773) to be quite small, 
in order to ensure that the measure v obeys all the required pseudorandonmess 
properties (this is controlled by the parameter r\\ , which has not yet made a formal 
appearance). Fortunately, the Bergelson-Leibman theorem (Theorem ll.il or more 
precisely Theorem 13.21 below) works for sets of arbitrarily small positive density, 
or equivalently for (bounded) functions of arbitrarily small positive mean 19 . This 
allows us to rely on fairly crude constructions for v which will be easier to estimate. 
This is in contrast to the recent work of Goldston, Yildirim, and Pintz on 
prime gaps, in which it was vitally important that the density of the prime counting 
function relative to the almost prime counting function be as high as possible, which 
in turn required a near-optimal (and thus highly delicate) construction of the almost 
prime counting function. 

To prove Theorem II. 31 it will suffice to prove the following quantitative estimate: 

Theorem 2.3 (Polynomial Szemeredi theorem in the primes, quantitative version). 
Let the notation and assumptions be as above. Then we have 



where the function c() is the one appearing in Theorem \S. 6 A ( Observe that since 
Pi(0) = for all 1 < i < k, the polynomial Pk(Wm)/W has integer coefficients.) 

Indeed, suppose that the estimate (jl3|) held. Then by expanding all the averages 
and using ljll|l we conclude that 



Here we are using the fact (from UlUf l) that Pi(Wm) is much less than N/2 for 
m 6 [M], and so one cannot "wrap around" the cyclic group Z^. Observe that 
each element in the set on the left-hand side yields a different pair (x',m') := 
(Wx + b, Wm) with the property that x' + Pi (to'), ...,x' + P k (m') S A. On the 
other hand, as N — > 00, the right-hand side goes to infinity. The claim follows. 

Remark 2.4. The above argument in fact proves slightly more than is stated by 
Theorem 11.31 Indeed, it establishes a large number of pairs (x\m!) with x' + 

19 As in [181 . the exact quantitative bound provided by this theorem (or more precisely Theorem 
13.21 will not be relevant for qualitative results such as Theorem ll.3l Of course, such bounds would 
be important if one wanted to know how soon the first polynomial progression in the primes (or 
a dense subset thereof) occurs; for instance such bounds influence how small r?4 and thus all 
subsequent r)s need to be, which in turn influences the exact size of the final o(l) error in Theorem 
12.31 Unfortunately, as the only known proof of Theorem ll . ll proceeds via infinitary ergodic theory, 
no explicit bounds are currently known, however it is reasonable to expect (in view of results such 
as 1161 . 1341 ) that effective bounds will eventually become available. 



(13) 
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Pi(m'), ... ,x' + Pfc(m') £ A, x' £ [N], and ml £ [M]; more precisely, there are at 
least 20 cNM/ \og k N such pairs for some c depending on 5q, Pi, . . ., Pk, /70, ■ • ■, 
f]j 21 . By throwing away the contribution of those x' of size <C N (which can be done 
either by modifying / in the obvious manner, or by using a standard upper bound 
sieve to estimate this component) one can in fact assume that x' is comparable to 
N. Similarly one may assume ml to be comparable to N Va . The upshot of this is 
that for any given t/q > one in fact obtains infinitely many "short" polynomial 
progressions x' + Pi(m'), ... ,x' + Pk(ml) with ml comparable to (x') no . One can 
take smaller and smaller values of rjo and diagonalize to obtain the same statement 
with the bound ml — (x')"^. This stronger version of Theorem 1 1.31 is already new 
in the linear case Pi = (i— l)m, although it is not too hard to modify the arguments 
in ^H] to establish it. Note that an inspection of the Furstenberg correspondence 
principle reveals that the Bergelson-Liebman theorem (Theorem II. Ill has an even 
stronger statement in this direction, namely that if A has positive upper density 
in the integers Z rather than the primes V , then there exists a fixed m =/= for 
which the set {x : x + Pi(m) £ A for all 1 < i < k} is infinite (in fact, it can be 
chosen to have positive upper density). Such a statement might possibly be true 
for primes (or dense subsets of the primes) but is well beyond the technology of this 
paper. For instance, to establish such a statement even in the simple case P\ = 0, 
P2 = m is tantamount to asserting that the primes have bounded gaps arbitrarily 
often, which is still not known unconditionally even after the recent breakthroughs 
in [T^. O n the other hand it may be possible to establish such a result with a 
logarithmic dependence between x' and m! ', e.g. ml = 0(log 0(1) x'). We do not 
pursue this issue here. 

It remains to prove Theorem 12.31 This shall occupy the remainder of the paper. 
The proof is lengthy, but splits into many non-interacting parts; see Figure ^for a 
diagram of the logical dependencies of this paper. 

2.5. Miscellaneous notation. To conclude this section we record some additional 
notation which will be used heavily throughout this paper. 

We have already used the notation Z[m] to denote the ring of integer-coefficient 
polynomials in one indeterminate 22 m. More generally one can consider Z [xi , . . . , , 
the ring of integer-coefficient polynomials in d indetcrminates xi,...,Xrf. More 
generally still we have Z[xi, . . . , xj- , the space of D-tuples of polynomials in 

20 To obtain such a bound it is important to remember that we can take w and hence W 
to be as slowly growing as one pleases; see II HI for further discussion. Note that if A = V is 
the full set of primes then the Bateman-Horn conjecture |3] predicts an asymptotic of the form 
(7 + o(l))NM/ log fc N for an explicitly computable 7; we do not come close to verifying this 
conjecture here. 

21 The arguments in this paper can be easily generalized to give a lower bound of 
cNMi . . . M r / log fc N on the number of tuples (x' , , . . . , m' r ) with x' + Pi(mj , • • ■ , m' r ), . . . , x' + 
P fc (m' 1 , . . . , m' r ) e A, x' e [AT], m' t £ [Mi] 1 < i < r, and Pj e Z[mi, . . . , m r ] for 1 < 3 < h. To 
obtain this one would only need to slightly modify the arguments in section |S| (see Remark l5.19l . 
whereas the rest of the proof remains the same. 

22 We shall use boldface letters to denote abstract indeterminates, reserving the non-boldface 
letters for concrete realizations of these indeterminates, which in this paper will always be in the 
ring of integers Z. 
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Cor 11.2 



12,C,D,E 



Hi7.ll |Cor 6A\ | Prop 6.5 
f 7.5 



Prop 5.9 







Cor 10.7 



Prop 7.7 1 | Prop 6.2 



1 



6.8.A 



f 5.10 



Prop 7.4 



1 



Prop 5.9 
linear case 



Th 11.1 



Cor 10.6 



f 11,C,E 



1 



Cor 10.5 



5.11,5.12,A 



Prop 10.1 



[Prop 7.3 



10,E 



iLem 9.51 
^9,D 



Figure 1. The main theorems in this paper and their logical de- 
pendencies. The numbers and letters next to the arrows indicate 
the section(s) where the implication is proven, and which appen- 
dices are used; if no section is indicated, the result is proven imme- 
diately after it is stated. Self-contained arguments are indicated 
using a filled-in circle. 



Z [xi , . . . , x<j] ; note that each element of this space defines a polynomial map from 
Z d to Z D . Thus we shall think of elements of Z[xi, . . . ,Xd] D as D-dimensional- 
valucd integer-coefficient polynomials over d variables. The degree of a monomial 
x™ 1 . . . x^ d is n\ + . . . + rid', the degree of a polynomial in Z[xi , . . . , Xd] D is the high- 
est degree of any monomial which appears in any component of the polynomial; we 
adopt the convention that the zero polynomial has degree — oo. We say that two 
ZJ-dimensional- valued polynomials P,Q £ Z[xi, . . . , Xd] D are parallel if we have 
nP = mQ for some non-zero integers n,m. 

If n = (ni, . . . , nn) an d m = (mi, . . . , mp) are two vectors in Z d , we use n ■ rh := 
niirii + . . . n D m D £ Z to denote their dot product. 

If / : X — > R and g : X —> R arc two functions, we say that / is pointwise 
bounded by g, and write / < g, if we have f(x) < g{x) for all x £ X. Similarly 
if g : X — > R + is non-negative, we write / = O(g) if we have f(x) = 0(g(x)) 
uniformly for all x £ X . If A C X, we use 1a ■ X — > {0, 1} to denote the indicator 
function of A; thus 1a(x) — 1 when x £ A and 1a(x) — when x £ A. Given any 
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statement P, we use lp to denote 1 when P is true and when P is false, thus for 
instance 1a(x) — l x eA- 

We define a convex body to be an open bounded convex subset of a Euclidean space 
R d . We define the inradius of a convex body to be the radius of the largest ball 
that is contained inside the body; this will be a convenient measure of how "large" 
a body is 23 . 



As in |I3|, our proof of Theorem 12 . 31 rests on three independent pillars - a quantita- 
tive Szemeredi-type theorem (proven by traditional ergodic theory), a transference 
principle (proven by finitary ergodic theory), and the construction of a pseudoran- 
dom majorant v for / (with the pseudorandomness proven by sieve theory). In this 
section we describe each these pillars separately, and state where they are proven. 

3.1. The quantitative Szemeredi-type theorem. Theorem l2.3l concerns a mul- 
tiple polynomial average of an unbounded function /. To control such an object, 
we first need to establish an estimate for bounded functions g. This is achieved as 
follows (cf. [IHI Proposition 2.3]): 

Theorem 3.2 (Polynomial Szemeredi theorem, quantitative version). Let the no- 
tation and assumptions be as in the previous section. Let S > 0, and let g : X — ► R 
be any function obeying the pointwise bound < g < 1 + o(l) and the mean bound 
fx 9 — $ ~ Then we have 



Jx 

for some c(5) > depending on 5, P±, . . . , Pk, but independent of N or W . 

It is not hard to see that this theorem implies Theorem ll.il The converse is not 
immediately obvious (the key point being, of course, that the bound c(S) in l|14|) is 
uniform in both TV and W); however, it is not hard to deduce Theorem 13.21 from 
(a multidimensional version of) Theorem 11.11 and the Furstenberg correspondence 
principle; one can also use the uniform version of the Bergelson-Leibman theorem 
proved in |S] . As the arguments here are fairly standard, and are unrelated to those 
in the remainder of the paper, we defer the proof of Theorem l3.2l to Appendix iBl 

3.3. Pseudorandom measures. To describe the other two pillars of the argument 
it is necessary for the measure v to make its appearance. (The precise properties 
of v, however, will not actually be used until Sections El and ) 



23 In 

our paper there will only be essentially two types of convex bodies: "coarse-scale" convex 
bodies of inradius at least M 1 ' 4 , and "fine-scale" convex bodies, of inradius at least 3> H. In 
almost all cases, the convex bodies will in fact simply be rectangular boxes. 



3. Three pillars of the proof 



(14) 
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Definition 3.4 (Measure). [E] A measure is a non-negative function v : X — > R 
with the total mass estimate 



for any e > 0. 

Remark 3.5. As remarked in |18) . it is really v^x which is a measure rather than 
where [ix is the uniform probability measure (mX\v should be more accurately 
referred to as a "probability density" or "weight function" . However we retain the 
terminology "measure" for compatibility with jTHj. The condition 1(16(1 is needed 
here to discard certain error terms arising from the boundary effects of shift ranges 
(such as those arising from the van der Corput lemma). This condition does not 
prominently feature in |18| . as the shifts range over all of Zjv, which has no bound- 
ary. Fortunately, 1(16(1 is very easy to establish for the majorant that we shall end 
up using. We note though that while the right-hand side of <(16|) does not look too 
large, we cannot possibly afford to allow factors such as N 6 to multiply into error 
terms such as o(l) as these terms will almost certainly cease to be small at that 
point. Hence we can only really use ((16(1 in situations where we already have a 
polynomial gain in N, which can for instance arise by exploiting the gaps in (|10|l . 

The simplest example of a measure is the constant measure v = 1. Another model 
example worth keeping in mind is the random measure where v{x) — log R with 
independent probability 1/logi? for each x G X, and v(x) — otherwise. The 
following definitions attempt to capture certain aspects of this random measure, 
which will eventually be satisfied by a certain truncated divisor sum concentrated 
on almost primes. These definitions are rather technical, and their precise form is 
only needed in later sections of the paper. They are somewhat artificial in nature, 
being a compromise between the type of control needed to establish the relative 
polynomial Szemercdi theorem fTheorem l3.16|l and the type of control that can be 
easily verified for truncated divisor sums ("Theorem 13.18(1 . It may well be that a 
simpler notion of pseudorandomness can be given. 

Definition 3.6 (Polynomial forms condition). Let v : X — » R + be a measure. We 
say that v obeys the 'polynomial forms condition if, given any < J, d < 1 /rji , any 
polynomials Qi, . . . , Qj G Z[mi, . . . , mj of d unknowns of total degree at most 
1/t^i and coefficients at most W 1 ^ 1 , with Qj — Qy non-constant for every distinct 
j,j' G [J], for every e > 0, and for every convex body fl C R d of inradius at least 
N E , and contained in the ball B(0, M 2 ), we have the bound 



(15) 





v = E (N e ) 



(17) 




Note the first appearance of the parameter 771, which is controlling the degree of 
the pseudorandomness here. Note also that the bound is uniform in the coefficients 
of the polynomials Q\, . . . , Qj. 
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Examples 3.7. The mean bound fTB|l is a special case of ifTTjl : another simple ex- 
ample is 

E he[H] [ vT h vT Wh \ = l + o{l). 
Jx 

Observe that the smaller one makes 771 , the stronger the polynomial forms condition 
becomes. 

Remark 3.8. Definition 13. 61 is a partial analogue of the "linear forms condition" in 
|18j . The parameter 771 is playing multiple roles, controlling the degree, dimension, 
number and size of the polynomials in question. It would be more natural to split 
this parameter into four parameters to control each of these attributes separately, 
but we have chosen to artificially unify these four parameters in order to simplify 
the notation slightly. The parameter e will eventually be set to be essentially 777, 
but we leave it arbitrary here to emphasize that the definition of pseudorandomness 
does not depend on the choice of 777 (or H). This will be important later, basically 
because we need to select v (or more precisely 772 (or R), which is involved in the 
construction of v) before we are allowed to choose 777. 

The next condition is in a similar spirit, but considerably more complicated; it 
allows for arbitrarily many factors in the average, as long as they have a partly 
linear structure, and that they are organized into relatively small groups, with a 
separate coarse-scale averaging applied to each of the groups. 

Definition 3.9 (Polynomial correlation condition). Let v : X — > R + be a measure. 
We say that v obeys the polynomial correlation condition if, given any < D, J, L < 
I/771, any integers D',D",K > 0, and any e > 0, and given any vector-valued 
polynomials 



Pj e Z[h. 
Qj,k G z[h 
Si G Z[h 



lj • 


■ ■ ,h D " 




1) • 


■ ■ ,h D " 


f 


lj • 


■ -,h D " 





of degree at most I/771 for j 6 [J] , k E [K] , I € [L] obeying the non-degeneracy 
conditions 

• For any distinct j,f £ [J] and any k € [K], the D + £>'-dimensional- valued 
polynomials (Pj,Qj k) an d (Pj'yQj',k) are not parallel. 

• The coefficients of Pj and Si are bounded in magnitude by W 1 ^ 1 . 

• The D'-dimensional- valued polynomials Si are distinct as I varies in [L]. 

and given any convex body Q C R D of inradius at least M 1 / 4 and convex bodies 
ft' C R, D , Q" C R, D of inradius at least N £ , with all convex bodies contained in 



18 



TERENCE TAO AND TAMAR ZIEGLER 



B(0,M 2 ), then 



Effen'nz ' ^hen"nz D " 



x 



(18) 



Si(h)-n 



.fee [if] je[J] J le[L] 

= 1 + 0_D',£>» jf, E (l) 

Remark 3.10. It will be essential here that D',D",K can be arbitrarily large 24 ; 
otherwise, this condition becomes essentially a special case of the polynomial forms 
condition. Indeed in our argument, these quantities will get as large as 0(1/t]q), 
which is far larger than l/r)\. As in the preceding definition, e will eventually be 
set to equal essentially 777, but we refrain from doing so here to keep the definition 
of pseudorandomness independent of 777, to avoid the appearance of circularity in 
the argument. 

Remark 3.11. The correlation condition Ijl8|l would follow from the polynomial 
forms condition l|17(l if we had the pointwise bounds 

ennz^ II T p i&>™ + ^->>&> H v = l + o(l) 
je[J] 



(19) 



for each k € [K] and all h, n. Unfortunately, such a bound is too optimistic to 
be true: for instance, if Pj(h) = Qj^{h) = then the left-hand side is an aver- 
age of v J , which is almost certainly much larger than 1. In the number-theoretic 
applications in which v is supposed to concentrate on almost primes, one also has 
similar problems when Pj(K), Qj t k{h) are non-zero but very smooth (i.e. they have 
many small prime factors slightly larger than w). In |18j these smooth cases were 
modeled by a weight function r, which obeyed arbitrarily large moment conditions 
which led to integral estimates analogous to l|18|) . In this paper we have found it 
more convenient to not explicitly create the weight function, instead placing the 
integral estimate <|18[) in the correlation condition hypothesis directly. In fact one 
can view (|18fl as an assertion that (|19fl holds "asymptotically almost everywhere" 
(cf. Proposition 16 . 21 below) . 

Remark 3.12. One could generalize Ijl8|l slightly by allowing the number of terms 
,/ in the j product to depend on k, but we will not need this strengthening and 
in any event it follows automatically from QlSj l by a Holder inequality argument 
similar to that used in Lemma f3. 141 below. 

Definition 3.13 (Pseudorandom measure). A pseudorandom measure is any mea- 
sure v which obeys both the polynomial forms condition and the correlation con- 
dition. 



The following lemma (cf. ^1 Lemma 3.4]) is useful: 

Lemma 3.14. If is is a pseudorandom measure, then so is i>xji '■= (1 + v)j2 
(possibly with slightly different decay rates for the o(l) error terms). 

24 An analogous phenomenon occurs in the correlation condition in 1181 . where it was essential 
that the exponent q appearing in that condition (which is roughly analogous to K here) could be 
arbitrarily large. 
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Proof. It is clear that vx/ 2 satisfies (11511 and Ijl6(l . Because v obeys the polynomial 
forms condition (|17J) . one can easily verify using the binomial formula that v\/2 
does also. Now we turn to the polynomial correlation condition, which requires a 
little more care. Setting Qj,k to be independent of k, we obtain that 



Enen , nz I5 '%en"nz D " / 
Jx 



je[J] 

= 1 + OD^£)",K,e(l) 



A' 



ft T 

ie[L] 



S,(h)-n 1 



for all K > and Pj, Qj, 5*; obeying the hypotheses of the correlation condition. 
By the binomial formula this implies that 



E nesi'nz D ' E ££n"nz D " 



x 



ie[J] 

K 



A 



■Q rpStihyrt^ 
le[L] 



A +Ojy lly , lKie (i). 



(Recall of course that 0° = 1.) Let us take if to be a large even integer. Another 
application of the binomial formula allows one to replace the final v by ^1/2 ■ By 
the triangle inequality in a weighted Lebesgue norm l K , we may then replace the 
other occurrences v by 1/1/2 also: 



E «ef!'nz D ' E Kefi"nz D " 



x 



E m£finZ D J\ T 

= K + o D , tD ,. tKt£ (l) 



Pj{h)-m+Qj{h)-n 



V\/2 



A 



n T^> n v 1/2 

is [A] 



This was only proven for even K, but follows also for odd K by the Cauchy-Schwarz 
inequality i|94|) . By Holder's inequality we obtain a similar statement when the Qj 
are now allowed to vary in k: 



E ne!!'nz D ' %en"nz D " 



A" 



Si(h)-n 



"1/2 



ie[L] 



II (*Wnnz- II T p ^ h > A +Q^ h >^ 1/2 - 1) 
_ke[K] je{J] 

= K + D >,D»,K,e{ 1 )- 

Applying the binomial formula again we see that V1/2 obeys (|18H as desired. □ 



3.15. Transference principle. We can now state the second pillar of our argu- 
ment (cf. [H Theorem 3.5]). 
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Theorem 3.16 (Relative polynomial Szemeredi theorem). Let the notation and 
assumptions be as in Section^ Then given any pseudorandom measure v and any 
g : X — > R obeying the pointwise bound < g < v and the mean bound 

(20) [ g>m, 

we have 

(21) E me[M] J x T p ^/ w g...T p ^ w ™V w g> ^(|) - (1) 
where c() is the function appearing in Theorem YS 

Apart from inessential factors of 2 (and the substantially worse decay rates con- 
cealed within the o(l) notation), this theorem is significantly stronger than Theorem 
13.21 which is essentially the special case v = 1. In fact we shall derive Theorem 
13.161 from Theorem 13 . 21 using the transference principle technology from |18|. The 
argument is lengthy and will occupy Sections 14171 

3.17. Construction of the majorant. To conclude Theorem 12.31 from Theorem 
13. 161 and (|12|) it clearly suffices to show (cf. |181 Proposition 9.1]) 

Theorem 3.18 (Existence of pseudorandom majorant). Let the notation and as- 
sumptions be as in Section^ Then there exists a pseudorandom measure v such 
that the function f defined in (|1 l|l enjoys the the pointwise bound < / < v. 

This is the third pillar of the argument. The majorant v acts as an "enveloping 
sieve" for the primes (or more precisely, for the primes equal to b modulo W) , in the 
sense of |57j. It is defined explicitly in Section [SJ However, for the purposes 
of the proof of the other pillars of the argument (Theorem 13.21 and Theorem 13. 16|) 
it will not be necessary to know the precise definition of v, only that v majorizes / 
and is pseudorandom. In order to establish this pseudorandomness it is necessary 
that r/2 is small compared to r/i . On the other hand, observe that v does not depend 
on H and thus is insensitive to the choice of r/j. 

The proof of Theorem l3 .181 follows similar lines to those in ^S], |2Jj], except that the 
"local" or "singular series" calculation is more complicated, as one is now forced 
to count solutions to one or more polynomial equations over F p rather than linear 
equations. Fortunately, it turns out that the polynomials involved happen to be 
linear in at least one "coarse-scale" variable, and so the number of solutions can be 
counted relatively easily, without recourse to any deep arithmetic facts (such as the 
Weil conjectures). We establish Theorem 13.181 in Sections 181121 using some basic 
facts about convex bodies, solutions to polynomial equations in F p , and distribution 
of prime numbers which are recalled in Appendices flCIEI respectively. 

4. Overview of proof of transference principle 

We now begin the proof of the relative polynomial Szemeredi theorem (Theorem 
I3.16J1 . As in ^H]i this theorem will follow quickly from three simpler components. 
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The first is the uniformly quantitative version of the ordinary polynomial Szemeredi 
theorem, Theorem 13.21 which will be proven in Appendix [5] The second is a 
"polynomial generalized von Neumann theorem" (Theorem l4.5() which allows us to 
neglect the contribution of sufficiently "locally Gowers-uniform" contributions to 
1)2 The third is a "local Koopman-von Neumann structure theorem" (Theorem 
14.7(1 which decomposes a function < / < v (outside of a negligible set) into 
a bounded positive component fjj± and a locally Gowers-uniform error fjj. The 
purpose of this section is to formally state the latter two components and show 
how they imply Theorem 13.161 the proofs of these components will then occupy 
subsequent sections of the paper. 

The pseudorandom measure v plays no role in the ordinary polynomial Szemeredi 
theorem, Theorem 13.21 In the von Neumann theorem, Theorem 14.51 the pseudo- 
randomness of v is exploited via the polynomial forms condition (Definition 13.6(1 . 
In the structure theorem, Theorem 14.71 it is instead the polynomial correlation 
condition (Definition 13.9(1 which delivers the benefits of pseudorandomness. 



4.1. Local Gowers norms. As mentioned in the introduction, a key ingredient 
in the proof of Theorem 13 . 161 will be the introduction of a norm || || q^h^.w) which 

controls averages such as those in ((21(1 . It is here that the parameter rj-j first makes 
an appearance, via the shift range H. The purpose of this subsection is to define 
these norms formally. 

Let / : X — ► R be a function. For any d > 1, recall that the (global) Gowers 
uniformity norm \\f\\u d °f / is defined by the formula 



2 

\U d 



El TT rpui 1 m 1 + ...+ui d m d r 

m 1 m 4 EZ), /ll 

An equivalent definition is 



x (u 1 ,...^ d )e{QA} d 



\\f\C:= E (o, (o, a, / TT T m^+.-+m^ f 

JX (u 11 ,...^ d )e{Q,l} d 

as can be seen by making the substitutions := + m, and shifting the 

integral by mf* + . . . + m? . 

We will not directly use the global Gowers norms in this paper, because the range 
of the shifts m in those norms is too large for our applications. Instead, we shall 
need local versions of this norm. For any steps ax, . . . , £ Z, we define the local 
Gowers uniformity norm U a j-^' ,ad by 25 



(22) 



\f\\2 d pi / TT r rm [ L ° 1> a 1 +...+m. u ' d 'a d f 

\j\\u a }21~' a * — ^f),...^',™! 1 ' m^l^ / 11 1 d J- 

Jx (^i,...^ d )e{o,i}<i 



2! ^We will need to pass from shifts of size O(M) to shifts of size 0{\/M) to avoid dealing with 
certain boundary terms (similar to those arising in the van der Corput lemma). 



22 



TERENCE TAO AND TAMAR ZIEGLER 



Thus for instance, when VM = N and ai, . . . , ad are invertible in Z^, then the 
t/^_"' 0d norm is the same as the U d norm. When \f~M is much smaller than N, 
however, there appears to be no obvious comparison between these two norms. It is 
not immediately obvious that the local Gowers norm is indeed a norm, but we shall 
show this in Appendix 1X1 where basic properties of these norms are established. In 
practice we shall take 01, . . . , ad to be rather small compared to R or M, indeed 
these steps will have size 0(H ^). 

Remark 4.2. One can generalize this norm to complex valued functions / by 
conjugating those factors of / for which uj\ + . . . + cod is odd. If we then set 
/ = e{4>) = e 27 ™* for some phase function <fi : X — > R/Z, then the local Gowers 
||/||(7 o i.— i<»d norm is informally measuring the extent to which the d-fold difference 

{-ir i+ - +Ud 4>{x + m^ l] ai + ... + m d Ud) a d ) 

u lt ...,u d e{0,l} 

is close to zero, where x ranges over X and rnf\m^ range over [M] for i £ [d\. 
Even more informally, these norms are measuring the extent to which <fr "behaves 
like" a polynomial of degree less than d on arithmetic progressions of the form 

{x + m\ai + . . . + m d ad ■ mi, . . . , m d € [Vm]} 

where x € X is arbitrary. The global Gowers norm U d , in contrast, measures 
similar behavior over the entire space X. 



We shall estimate the Gowers- uniform contributions to (|21(l via repeated application 
of the van der Corput lemma using the standard polynomial ergodic theorem (PET) 
induction scheme. This will eventually allow us to control these contributions, not 
by a single local Gowers-uniform norm, but rather by an average of such norms, in 
which the shifts hi, . . . ,hd are fine and parameterized by a certain polynomial. More 
precisely, given any t > and any ci-tuple Q — (Qi, . . . , Qd) € Z[hi, . . . , h t , W] d of 
polynomials, we define the averaged local Gowers uniformity norm UygTM by 
the formula 

Inserting (|22|l we thus have 



(24) 



\\2 d f 
IfWjjQimKw) : ^ E /7e[H] tE m5 D ^...,m^ ^m< 1 ^...,m^ 1) e[v / M] / 
■Jm J % 



X 

(-1), 



JT T m\ u ' 1 >Q 1 (h,W) + ...+m^ d >Q d (h.W) f 

(uj 1 ,...,u d )<£{a,l} d 

In Appendix ^ we show that the local Gowers uniformity norms are indeed norms; 
by the triangle inequality in I 2 , this implies that the averaged local Gowers uni- 
formity norms are also norms. To avoid degeneracies we will assume that none of 
the polynomials Qi, . . . , Qd vanish. 

Remark 4.3. The distinction between local Gowers uniform norms and their av- 
eraged counterparts is a necessary feature of our "quantitative" setting. In the 
"qualitative" setting of traditional (infinitary) ergodic theory (where X is infinite) , 
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there is no need for this sort of distinction; if the local Gowers uniformity norms 
go to zero as M — ► oo for the shifts h% = . . . = hd = 1, then it is not hard (using 
various forms of the Cauchy-Schwarz-Gowers inequality, such as those in Appendix 
|A"|) to show the same is true for any other fixed choice of shifts hi, . . . , hd, and hence 
the averaged norms will also go to zero as M — > oo for any fixed choice of Q and 
H . The converse implications are also easy to establish. Thus one can use a single 
local Gowers uniformity norm, Ujj^' , to control everything in the limit M — ► oo 
with H bounded; this then corresponds to the Gowers-Host-Kra seminorms used in 
|22| . |24 | to control polynomial averages. However in our more quantitative setting, 
where H is allowed to grow like a (very small) power of TV, we cannot afford to use 
the above equivalences (as they will amplify the o(l) errors in our arguments to 
be unacceptably large), and so must turn instead to the more complicated-seeming 
averaged local Gowers uniformity norms. 

4.4. The polynomial generalized von Neumann theorem. We are now ready 
to state the second main component of the proof of Theorem 13.161 (the first com- 
ponent being Theorem 13. 2 [I . 

Theorem 4.5 (Polynomial generalized von Neumann theorem). Let the notation 
and assumptions be as in Section^ Then there exists a d > 2, t > of size 0(1) 
and d-tuple Q £ Z[hi, . . . , h f , W] d of degree 0(1) and coefficients 0(1), with none 
of the components of Q vanishing, as well as a constant c > depending only on 
Pi , . . . , Pk , such that one has the inequality 

(25) |E me[M] / T p ^™y w gi...T p *W™y w g h \^ mm Jgi\\ c rrW , w) + o(l) 

for any functions g±,...,gk '■ X — > R obeying the pointwise bound \gi\ < 1 + v for 
all 1 < i < k and x G X , and some pseudorandom measure v. 

This theorem is a local polynomial analogue of |181 Proposition 5.3]. It will be 
proven by a vast number of applications of the van der Corput lemma and the 
Cauchy-Schwarz inequality following the standard PET induction scheme; the idea 
is to first apply the van der Corput lemma repeatedly to linearize the polynomials 
Pi, . . . , Pk, and then apply Cauchy-Schwarz repeatedly to estimate the linearized 
averages by local Gowers norms. The presence of the measure v will cause a large 
number of shifts of v to appear as weights, but these will ultimately be controllable 
via the polynomial forms condition (Definition 13.61) . The final values of d and t 
obtained will be very large (indeed, they exhibit Ackermann-type behavior in the 
maximal degree of Pi, . . . , Pk) but can be chosen to be small compared to l/?/i, 
which controls the pseudorandomness of v. 

The proof of Theorem l4.5l is elementary but rather lengthy (and notation-intensive), 
and shall occupy all of Section[S] The v = 1 case of this theorem is a finitary version 
of a similar result in while the linear case of this theorem (when the Pi — Pj 
are all linear) is essentially in ^SJ. Indeed the proof of this theorem will use a 
combination of the techniques from both of these papers. 
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4.6. The local Koopman-von Neumann theorem. The third major compo- 
nent of the proof of Theorem 13. 161 is the following structure theorem. 

Theorem 4.7 (Structure theorem). Let the notation and assumptions be as in 
Section\B Let t > 0,d > 2 be of size 0(1), and let Q e Z[hi, . . . , h t , W] d be 
polynomials of degree O(l) and coefficients 0(1) (with none of the components of 
Q vanishing) . Then given any pseudorandom measure v and any g : X — ► R + with 
the pointwise bound < g < v , there exist functions gjjx , gu : X — > R with the 
pointwise bound 

(26) 0< gu x(x)+ gu (x)<g(x) 

of g outside of this exceptional set obeying the following estimates: 



• (Boundedness of structured component) We have the pointwise bound 

(27) 0<^x(s)<l. 

• (9U 1 - captures most of the mass) We have 

(28) / 9u ±> f g~0(r H )-o(l). 

Jx Jx 

• (Uniformity of unstructured component) We have the bound 

(29) IMLecm'.w) <vl /2d +o(l). 

Remark 4.8. Note the first apperance of the parameter 774, which is controlling 
the accuracy of this structure theorem. One can make this accuracy as strong as 
desired, but at the cost of pushing 777 (and thus H) down, which will ultimately 
worsen many of the o(l) errors appearing here and elsewhere. 



Theorem 14.71 is the most technical and difficult component of the entire paper, 
and is proven in Sections 16171 It is a "finitary ergodic theory" argument which 
relies on iterating a certain "dichotomy between structure and randomness" . Here, 

the randomness is measured using the local Gowers uniformity norm f/^jj^ ,w \ 
To measure the structured component we need the machinery of dual functions, 
as in 18] , together with an energy incrementation argument which we formalize 
abstractly in Theorem l7.ll A key point will be that — 1 is "orthogonal" to these 
dual functions in a rather strong sense (see Proposition 16. 5|l . which will be the key 
to approximating functions bounded by v with functions bounded by 1. This will 
be accomplished by a rather tricky series of applications of the Cauchy-Schwarz 
inequality and will rely heavily on the polynomial correlation condition (Definition 
EH). 



4.9. Proof of Theorem 13.161 Using Theorem 14. 51 and Theorem 14.71 we can now 

quickly prove Theorem 13.161 (and hence Theorem 11.31 assuming Theorem 13. 1811 , 
following the same argument as in |18j. 

Let the notation and assumptions be as in Section [21 Let v be a pseudorandom 
measure, and let g : X — ► R obey the pointwise bound < g < u and i|20|). 
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Let d > 0, t > and Q be as in Theorem 14.51 these expressions depend only on 
Pi, . . . , Pk and so we do not need to explicitly track their influence on the 0() and 
o() notation. Applying Theorem l4.7l we thus obtain functions gu, gu± obeying the 
properties claimed in that theorem. From 126(1 we have 

E me[M ] / T p ^ w m)/W g . . . T p ^ Wm ^ w g > 
Jx 

17 f r r P 1 (Wm)/W I , \ rpP k (Wm)/W I , \ 

E mG [ A f] / T n " {gu± + gu) . . .T " (g u ± + gu). 

Jx 

We expand the right-hand side into 2 k = 0(1) terms. Consider any of the 2 k — 1 of 
these terms which involves at least one factor of gu- From (|26|l . I|27(l we know that 
gu and gu± are both bounded pointwise in magnitude by v + 1 + o(l), which is 
0(^ + 1) when A is large enough. Thus by Theorem 14 . 51 and ((29(1 . the contribution 
of all of these terms can be bounded in magnitude by 

« IMI^J([*I*.W) + W « ^ + 0(1) 

for some c > depending only on P 1; . . . , Pj.. On the other hand, from ((28(1 . I(2(J|) 

and the choice of parameters we have 

1 

Applying this, 1(2 7|). and Theorem 13. 21 obtain 

E m6tM ] / T p ^/ W 9u , . . . tW™)/«Wx > c( m /2) > 0. 

Putting all this together we conclude 

E me[M ] / T p ^^ w g...T p ^^ w g>c( m /2)-0( V f d )-o(l). 
Jx 

As i]4 is chosen small compared to 773, Theorem 13.1 61 follows . 
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5. Proof of generalized von Neumann theorem 



In this section we prove Theorem 14.51 In a nutshell, our argument here will be a 
rigorous implementation of the following scheme: 

polynomial average weighted linear average + o(l) (van der Corput) 

<C weighted parallelopiped average + o(l) (weighted gen. von Neumann) 
<C unweighted parallelopiped average + o(l). (Cauchy-Schwarz) 

The argument is based upon that used to prove (181 Proposition 5.3], namely re- 
peated application of the Cauchy-Schwarz inequality to replace various functions gt 
by v (or v + 1), followed by application of the polynomial forms condition (Defini- 
tion l3.6|) to replace the resulting polynomial averages of v with l + o(l). The major 
new ingredient in the argument compared to |18| will be the polynomial ergodic 
theorem (PET) induction scheme (used for instance in [Sj) in order to estimate 
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the polynomial average in l[25jl by a linear average similar to that treated in [181 
Proposition 5.3]. After using PET induction to achieve this linearization, the rest 
of the proof is broadly similar to that in [181 Proposition 5.3], except for the fact 
that the shift parameters are restricted to be of size M or \AM rather than N, 
and that there is also some additional averaging over short shift parameters of size 
0(H). 

The arguments are elementary and straightforward, but will require a rather large 
amount of new notation in order to keep track of all the weights and factors created 
by applications of the Cauchy-Schwarz inequality. Fortunately, none of this notation 
will be needed in any other section; indeed, this section can be read independently of 
the rest of the paper (although it of course relies on the material in earlier sections, 
and also on Appendix ^J. 

We begin with some simple reductions. First observe (as in ^Sj) that Lemma T3. 141 
allows us to replace the hypotheses \gi\ < 1 + v by the slightly stronger \gi\ < i/, at 
the (acceptable) cost of worsening the implied constant in l[25[l by a factor of 2 fc . 
Next, we claim that it suffices to find d, t, c and Q £ Z[hi, . . . ,ht,W] d for which 
we have the weaker estimate 

(30) |E me[M] / T p ^/ w gi ...T p ^l w 9k \ « 11^11^.^+0(1) 

(i.e. we only control the average using the norm of gi, rather than the best norm 
of all the gi). Indeed, if we could show this, then by symmetry we could find di, U, 
and Qi S Z [hi , . . . , h ti , W] di for i = 1, . . . , k such that 

|E me[M ] / ^(^/^...T^^/^K \\ 9i r w) +o(l) 

whenever 1 < i < k and v is pseudorandom. The claim then follows by using 
Lemma IA.3I to obtain a local Gowers norm U which dominates each of 

1 1 VM 

the individual norms ' ' W \ an d taking c := mini^^ (Note that the 

pointwise bound \gi\ <v and the polynomial forms condition easily imply that the 

V&ttBp.W) normof5iis 0(1) .) 

It remains to prove l|3U[l . It should come as no surprise to the experts that this 
type of "generalized von Neumann" theorem will be proven via a large number 
of applications of van der Corput's lemma and the Cauchy-Schwarz inequality. In 
order to keep track of the intermediate multilinear expressions which arise during 
this process, it is convenient to prove a substantial generalization of this estimate. 
We first need the notion of a polynomial system, and averages associated to such 
systems. 

Definition 5.1 (Polynomial system). A polynomial system S consists of the fol- 
lowing objects: 

• An integer D > 0, which we call the number of fine degrees of freedom] 

• A non-empty finite index set A (the elements of which we shall refer to as 
nodes of the system); 
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• A polynomial R a G Z[m, hi, . . . , h^, W] in D + 2 variables attached to 
each node a G A; 

• An distinguished node ao G A; 

• A (possibly empty) collection A' C A\{a } of inactive nodes. The nodes in 

will be referred to as active, thus for instance the distinguished node 
a is always active. 

We say that a node a G A is linear if i? Q — i? Qo is at most linear in m, thus the 
distinguished node is always linear. We say that the entire system S is linear if 
every active node is linear. We make the following non-degeneracy assumptions: 

• Ha, (3 are distinct nodes in A, then R a —Rp is not constant in m, hi, . . . , hp. 

• If a, (3 are distinct linear nodes in A, then R a — Rp is not constant in m. 

Given any two nodes a, (3, we define the distance d(a, (3) between the two nodes to 
be the m-degree of the polynomial R a — Rp (which is non-zero by hypothesis); thus 
this distance is symmetric, non-negative, and obeys the non-archimcdean triangle 
inequality 



Note that a is linear if and only if d(a, a\) < 1, and furthermore we have d(a, (3) = 1 
for any two distinct linear nodes a, (3. 

Example 5.2. Take D := 0, A := {1,2,3}, with Ri := 0, R 2 := m, and R 3 := m 2 
with distinguished node 3. Then the 3 node is linear and the other two arc non- 
linear. (If the distinguished node was 1 or 2, the situation would be reversed.) 

Remark 5.3. The non-archimedean semi-metric is naturally identifiable with a tree 
whose terminal nodes are the nodes of S, and whose intermediate nodes are balls 
with respect to this semi-metric; the distance between two nodes is then the height 
of their join. It is this tree structure (and the distinction of nodes into active, 
inactive, and distinguished nodes) which shall implicitly govern the dynamics of 
the PET induction scheme which we shall shortly perform. We will however omit 
the details, as we shall not explicitly use this tree structure in this paper. 

Definition 5.4 (Realizations and averages). Let S be a polynomial system, and v 
be a measure. We define a v -realization f — (f a )aeA of S to be an assignment of 
a function f a : X — > R to each node a with the following properties: 

• For any node a, we have the pointwise bound \f a \ < v. 

• For any inactive node a, we have f a = v. 

We refer to the function f ao attached to the distinguished node a as the dis- 
tinguished function. We define the average A 5 (/) G R of a system S and its 
^-realization / to be the quantity 



d(a,j) < max(d(a, (3), d((3, 7)). 
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Example 5.5. If S is the system in Example 15 .21 then 
(31) A s (/) = E m6[M] f hT m f 2 T m2 f 3 . 

Example 5.6. The average 

E^ e[ H]E roe[M] / vT m+h hTm+ h' hT {m+hf hT{m +h'f h 
JX 

can be written in the form As(/) with distinguished function /3, where 5 is a system 
with D :— 2, A := {1, 2, 2', 3, 3'} with the 1 node inactive and distinguished node 
3, with R x := 0, R 2 := m + hi, R 2 < := m + h 2 , R 3 := (m + hi) 2 , R 4 := (m + h 2 ) 2 , 
and / is given by f x := v, f 2 > := / 2 , and / 3 / := / 3 . 

Example 5.7 (Base example). Let iS be the system with D := 0, A := {1, . . . , k}, 
a ■= 1, A' = (thus all nodes are active), and := P,-(Wm)/W. We observe 
from @ that this is indeed a system. Then / :— (<?i, . . . , gk) is a ^-realization of iS 
with distinguished function 171, and 

A 5 (/) = E me[M] [ T Pl( - Wm ^ w . . . T Pk( - Wm ^ w g k . 
Jx 

This system S is linear if and only if the polynomials Pi — Pj are all linear. 

Remark 5.8 (Translation invariance). Given a polynomial system S and a polyno- 
mial R € Z[m, hi, ... , h t , W], we can define the shifted polynomial system S — R 
by replacing each of the polynomials R a by R a — R; it is easy to verify that this 
does not affect any of the characteristics of the system, and in particular we have 
A^(/) = A$-ji(f) for any ^-realization / of S (and hence of iS — R). This transla- 
tion invariance gives us the freedom to set any single polynomial R a of our choosing 
to equal 0; indeed we shall exploit this freedom whenever we wish to use van der 
Corput's lemma or Cauchy-Schwarz to deactivate any given node. 

The estimate H3()(l then follows immediately from 

Proposition 5.9 (Generalized von Neumann theorem for polynomial systems). 
Let S be a polynomial system with distinguished node ctQ. Then, ij r\\ is sufficiently 
small depending on S,ao, there exists d,t > 0, Q G Z[hi, . . . ,h t , W] d and c > 
depending only on S, ao such that one has the bound 

|A S (/)| « 5 \\f ao \\ c rrm „ ]t , w) +o 5 (l) 

whenever v is a pseudorandom measure and f is a v -realization of S with distin- 
guished function f ao . 

Indeed, one simply applies this proposition to Example 15.71 to conclude l|3U|) . 

It remains to prove Proposition 15.91 This will be done in three stages. The first 
is the "linearization" stage, in which a weighted form of van der Corput's lemma 
and the polynomial forms condition are applied repeatedly (using the PET induc- 
tion scheme) to reduce the proof of Proposition 15.91 to the case where the system 
S is linear. The second stage is the "parallelopipedization" stage, in which one 
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uses a weighted variant of the "Cauchy-Schwarz-Gowers inequality" to estimate 
the average As(/) associated to a linear system S by a weighted average of the 
distinguished function f ao over parallelopipeds. Finally, there is a relatively simple 
"Cauchy-Schwarz" stage in which the polynomial forms condition is used one last 
time to replace the weights by the constant 1 , at which point the proof of Proposi- 
tion 15. 91 is complete. We remark that the latter two stages (dealing with the linear 
system case) also appeared in |18| : the new feature here is the initial linearization 
step, which can be viewed as a weighted variant of the usual polynomial generalized 
von Neumann theorem (see e.g. (Hj). This linearization step is also the step which 
shall use the fine shifts h = 0(H) (for reasons which will be clearer in Section 
EJ; this should be contrasted with the parallelopipedization step, which relies on 
coarse-scale shifts m = 0(yM). 

5.10. PET induction and linearization. We now reduce Proposition to the 
linear case. We shall rely heavily here on van der Corput's lemma, which in practical 
terms allows us to deactivate any given node at the expense of duplicating all the 
other nodes in the system. Since this operation tends to increase the number 
of active nodes in the system, it is not immediately obvious that iterating this 
operation will eventually simplify the system. To make this more clear we need to 
introduce the notion of a weight vector. 

Definition 5.11 (Weight vector). A weight vector is an infinite vector w — (wx, W2, ■ ■ ■ , ) 
of non-negative integers Wi, with only finitely many of the Wi being non-zero. Given 
two weight vectors w = (wi, u>2, . . .) and iff = (id^u^, ■ ■ •), we say that w < iff if 
there exists k > 1 such that < w' k , and such that Wi = for all i > k. We say 
that a weight vector is linear if u>i = for alii > 2. 

It is well known that the space of all weight vectors forms a well-ordered set; indeed, 
it is isomorphic to the ordinal w u . In particular we may perform strong induction 
on this space. The space of linear weight vectors forms an order ideal; indeed, a 
weight is linear if and only if it is less than (0, 1,0,...). 

Definition 5.12 (Weight). Let S be a polynomial system, and let a be a node in S 
(in practice this will not be the distinguished node ao). We say that two nodes ft, 7 
in S are equivalent relative to a if d(ft, 7) < d(a, ft). This is an equivalence relation 
on the nodes of S, and the equivalence classes have a well-defined distance to a. 
We define the weight vector w a (S) of S relative to a by setting the i th component 
for any i > 1 to equal the number of equivalence classes at distance i from a. 

Example 5.13. Consider the system in Example 15.21 The weight of this system 
relative to the 1 node is (1,1,0,...), whereas the weight of the system in Example 
15 .61 relative to the 2 node is (0, 1, 0, . . .) (note that the inactive node 1 is not relevant 
here, nor is the node 2' which has distance from 2) , which is a lower weight than 
that of the previous system. 

The key inductive step in the reduction to the linear case is then 

Proposition 5.14 (Inductive step of linearization) . Let S be a polynomial system 
with distinguished node ao and a non-linear active node a. Ifrji is sufficiently small 
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depending on S,ao,a, then there exists a polynomial system S' with distinguished 
node a' and an active node a' with w a i(S') < w a (S) with the following property: 
given any pseudorandom measure v and any v -realization f of S, there exists a 
is -realization f of S' with the same distinguished function (thus f ao = f',) such 
that 

(32) |A 5 (/)| 2 <<A 5 ,(/') + °s(l)- 

Indeed, given this proposition, a strong induction on the weight vector w a (S) im- 
mediately implies that in order to prove Proposition 15.91 it suffices to do so for 
linear systems (since these, by definition, are the only systems without non-linear 
active nodes). 

Before we prove this proposition in general, it is instructive to give an example. 

Example 5.15. Consider the expression (|31|l with /i,/2,/3 bounded pointwise by 
v. We rewrite this expression as 

A s (f) = / fiE me[M] T m f 2 T m2 f 3 
Jx 

and thus by Cauchy-Schwarz i|94[) 

|A 5 (/)| 2 <(/ u)(f HE m6[M] T-/ 2 T m 7 3 | 2 ). 
Jx Jx 

By (|15|l the first factor is 1 + o(l). Also, from van der Corput's lemma (Lemma 
IA.1|) we have 

\V me[M] T m f 2 T m2 f 3 \ 2 <E, i , i , e[ H ] E me[M] r m +' l /2T m+ ' l 72T 1 ( m+ ' i ) 2 /3T 1 ( m+/l ') 2 /3+o(l). 

We may thus conclude a bound of the form Q32|l . where As'(f') is the quantity 
studied in Example 15.61 Note from Example 15.131 that S' has a lower weight than 
S relative to suitably chosen nodes. 

of Proposition \5.14\ By using translation invariance f Remark 15. 8|) we may normal- 
ize R a = 0. We split A = A U A x , where A := {(3 e A : d(a,f3) = 0} and 
A\ := A\Ao. Since a is nonlinear, the distinguished node ao lies in A\. Then Rp is 
independent of m for all (3 £ A : Rp(m, hi, ... , h d , W) = Rp{h\, . . . , h dl W). We 
can then write 

i,...,hn&[H] / Fh 1 ,...,h D ^'me[M]Gm,h l ,...,h D 
JX 

where 

F hu ..., hD := I] T Rf, ( hl >-' hd ' w ) fp 
PeA 

and 

n , . — rpR fl (m,h 1 ,...,h d ,W) f 
< ~'m,hi,...,h D ■ — J J/3- 

Since \fp\ is bounded pointwise by v, we have \Fh lt ...,h D \ ^ Flh 1 ,...,h D where 

(33) H hli ... thD := Yl T R eV*'"-> h >» w )v 
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and thus by Cauchy-Schwarz 

|A 5 (/)| 2 < (E hlt ... ihDe[H] f H hl „,„ hD ) 

Jx 

x ^h 1 ,....h D e[H] / ^/u....,/id \^'rneMGm.h 1 ,...,h D \ 2 ■ 

Jx 

Since v is pseudorandom and thus obeys the polynomial forms condition, we see 
from Definition 13.61 and l|33|l (taking rji sufficiently small) that 



^h x ,...,h D ^[H] I H hlt ,„ thD = 1 + os (1) 
Jx 



IX 

(note by hypothesis that the Rp — Rp> are not constant in m, hi, . . . , ho). Since 
we are always assuming N to be large, the 05(1) error is bounded. Thus we reduce 
to showing that 

^hi,...,h e[H] / H hl ^,, t h D \E lm( zMG m ^h 1 ,...,h D \ 2 < A 5 /(/') + o 5 (l) 
Jx 

for some suitable S', /'. But by the van der Corput lemma (Lemma IA.1|) (using 
(I16[) to get the upper bounds on G m j ll: ,,,,h D )i we have 

|E m gA/G TOj / ll ^...^ E , | 2 <C 'EihJi'e[H]^me[M]G m +hJi 1 ,... : h D Gm+h' ,hi,...,h,D + 

and so to finish the proof it suffices to verify that the expression 

(34) ^hi,...,hn,h,h'elH]^me[M] / Hf ll ,...ji D G m+ h,h 1 ,....h D G m +h',h 1 ,...,h D 

Jx 

is of the form As'(f') for some suitable S' and /'. By inspection we see that we 
can construct S' and /' as follows: 



• We have D + 2 fine degrees of freedom, which we label hi, ... , hp, h, h'; 

• The nodes A' of iS' are A' := Aq U A\ U A[ , where A[ is another copy of A\ 
(disjoint from A U A\), with distinguished node a' = ao € A\. 

• We choose the node a' to be an active node of S in Ai which has minimal 
distance to ag. (Note that Ai always contains at least one active node, 
namely ao.) 

• If P S Ao, then (3 is inactive in S' , with RUm, h\, . . . , h^, h, h', W) := 
i?/3(m, hi, ...,h d ,W) and f & := v; 

• If /3 € A\, then /3 is inactive in S' if and only if it is inactive in S, with 
i^(m, hi, ... , h d , h, h', W) := R (m + h, h 1; . . . , h d , W), and ^ := f ; 

• If /?' € is the counterpart of some node /3 G Ai, then /3' is inactive in 
S' if and only if (3 is inactive in S, with i?^,(m, h 1; . . . , h,;, h, h', W) := 

R$(m + h', hi, . . . ,h d , W), and f' pl := fp. 

It is then straightforward to verify that S' is a polynomial system, that /' is a 
realization of S' , and that l|34|l is equal to As/(/'). It remains to show that 
uv(iS') < uJ a (<S). Let d be the degree in m of R a — R a >, thus d > 1. One 
easily verifies that the i th component of w a >(S') will be equal to that of w a (S) for 
i > d, and equal to one less than that of w a (S) when i — d (basically due to the 
deactivation of all the nodes at Ao). The claim follows. (The behavior of these 



:S2 
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weight vectors for i < d is much more complicated, but is fortunately not relevant 
due to our choice of ordering on weight vectors.) □ 

5.16. Parallelopipedization. By the preceding discussion, we see that to prove 
Proposition 15.91 it suffices to do so in the case where S is linear. To motivate the 
argument let us first work through an unweighted example (with v = 1). 

Example 5.17 (Unweighted linear case). Consider the linear average 

As(/) = E(i,/i'g[/f]E ra£ [M] / foT hm fiT hm f 2 

Jx 

with distinguished function / , and with |/ |, |/i|, I/2I bounded pointwise by 1. We 
introduce some new coarse-scale shift parameters mi, ma € [Va7]. By shifting m 
to m — mi — m,2 one can express the above average as 

EM'E[ H ]E me[ M, f x E mi , m2£[ VM]/o^ (m - mi - m2 )/iT ft '(— — )/ 2 + (1) 

and then shifting the integral by T hmi+h 1712 we obtain 

Etti / T7i m/imi+Zi'mo ^ rfihim—mo^+h' mo £ rnh' (m—n r li)-\-h'mi £ \ n f-\ \ 

h,h>e[H]E m e[M] J x E mi ,m 2 e[VM] T J° T J lT /2+0(l). 

The point is that the factor T h ^ m ~ m2 ^ +h ' m2 /i does not depend on mi, while 
rph (m-mi)+hmi j? 2 does no t depend on m%. One can then use the Cauchy-Schwarz- 
Gowers inequality (see e.g. (201 Corollary B.3]) to estimate this expression by 

The main term here can then be recognized as a local Gowers norm Q24|l . 

Now we return to the general linear case. Here we will need to address the pres- 
ence of many additional weights which are all shifted versions of the measure 
which requires the repeated use of weighted Cauchy-Schwarz inequalities. See |18l 
§5] for a worked example of this type of computation. Our arguments here shall 
instead follow those of |2()l §C], in particular relying on the weighted generalized 
von Neumann inequality from that paper (reproduced here as Proposition IA.2|l . 

We turn to the details. To simplify the notation we write h := (hi, . . . , hj) and 
h := (hi, . . . , hd). We use the translation invariance f Remark I5.8JI to normalize 
Ra — 0. We then split A = {ao} U Ai U A n i, where Ai consists of all the linear 
nodes, and A n i all the nonlinear (and hence inactive) nodes. By the non-degeneracy 
assumptions in Definition 15. II we may write 

Ra = b a m + c a 

for all a £ Ai and some b a ,c a G Z[h, W] with the b a all distinct and non-zero. We 
can then write 

A s (f)=-E ne[H]D E me[M] f f ao ( JT T R ^ w )u) J] T b - m+ ^f a . 

X a(EA n i aeAi 
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We introduce some new coarse-scale shift parameters m a 6 [vM] for a S Ai, thus 
the vector fa :— (m a ) aeAl lies in We shift m to m — ^2 aeAl m a and 

observe that 

whenever m a S [VM] and x m = E (N £ ) for all e > 0. Averaging this in m (cf. 
(I§fj|> ) we obtain 

E me[Mpm = E me[\/Mpi E me[AfFm-£ aeA! m a + o(l). 

Applying this (and ). and shifting the integral by the polynomial 
(35) Q := ]T b a (h,W)m a , 

a£A t 

we obtain 

A 5 (/) = E / - £[ff]D E me[M] / EtfiE[V^ A lfa ,m,h,m,W II f a ,m.h.m,W + 

where 

"Q T fl Q (m-£ QeA; m a ,h,W) 
f _ ._ T b a (h,W)rn+c a (h,W)+Y. (ieAl (b(3{h,W)-b a (h,W))rn a , 

The point of all these manipulations is that for each linear node a € Ai, j ' ? ^ ^ 
is independent of the coarse-scale parameter m Q . Also observe the pointwise bound 

,m,h,rn,W' — a,rn.h,rh,W 

where 

^ , = T fc Q (K,W)m+c Q (K,W)+i:^A 1 (^(^.M / )-''c(S,H'))m fv 

By applying the weighted generalized von Neumann theorem (Proposition IA.21) in 

the m variables we thus have 

(36) 

|A 5 (/)| < E ne[H]D E me[M] / \\f ao ^s,-,w\\a A i(y) II \W a ,mX;w\\nAM°>y 



Jan.h,rn.W 



We now claim the estimate 



( 37 ) E Ke[/fp E ™e[M] J \W a ,m,h,-,w\\oA\w < 1 

for each a £ Ai. Indeed, the left-hand side can be expanded as 

E Ke[fl"] r)E 'n< ','n< 1 'e[H] j4 i E TO6[M] / H 

X u,e{o,i} A AM 

T b a (hW) m +c a 0i,W)+E (ieAl ( b llfi,W)-b a (h,W))7n^' 3 ' > i/ 

The distinctness of the bp ensures that that the polynomial shifts of v here are all 
distinct, and so by the polynomial forms condition (Definition 13. 6[) we obtain the 
claim (taking rji suitably small). 
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In view of (|36|l . (|37|l . and Holder's inequality, we see that 

|2|^i ^1/2 

Thus to prove Proposition 15.91 it suffices to show that 



A s (f)\ «s (E Ke[Hp E me[M] / ||/ ao , m ,g,, w ||^; M ) 1/2 

« x 



(38) E^ e[ff]D E me[M] / ||/ Q0)m ^ w || n A i(l A ~~ ll/"oll q([h]D,»-) + °s(!) 

for some suitable Q € Z[hi, . . . , hp, W]. The left-hand side of (|38JI can be expanded 
as a weighted average of f ao over parallelopipeds, or more precisely as 

(39) E femD E. (0)! . (1)6[ ^„ / ( J] r«»^ M ^/ ao ) w (M (0 U (1) ) 

x ue{o,i} A i 

where :— (m < ^ a ' > ) ae Ai and w(h,rh^°\rh^) is the weight 

we{o,i} A i aeA,,, 



5.18. Final Cauchy-Schwarz. Let us temporarily drop the weight w in 1|39|) and 

consider the unweighted average 



E i6[ff] DE m(») ; mWe[v'i] J i 



l J~J T Qo(h.m^ .,W) j a 

" X - r ~ - -> A . 



Using this is 



which on comparison with l|24|) is indeed of the form ||/ Qo || 2 q\ [h] d w) for some 



2i. 



Q £ Z[hi, . . . , he, W]; note that because the & Q are non-zero, all the components 
of Q are non-zero. Thus it suffices to show that 

^m^m^ivm^ [ ( II T Qo( ^ M ' ff) / a „)(^(M (0 U (1) )-i) = O 5(i). 



x we{o,i} J ! 



Applying Cauchy-Schwarz (|94|l with the pointwise bound \f ao \ < ^ we reduce to 
showing 

Efc [fl|D E, m ,, (1)e[ ^J ( I] ^o(^ , ^),)( w (/J, 77 i(o)^(i))-l) J= o J + 05 (l) 

for j = 0, 2 (with the usual convention 0° = 1) which in turn will follow if we can 
show 



E ftG [H] d E m(°) .mW G [%/M]- 



f ( TJ ^M-'.nO^^W^d))) = l+o 5 (l) 

X _ f„ , ! A, 



2 ^There is the (incredibly unlikely) possibility that D = or D = 1, but by using the mono- 
tonicity of the Gowers norms ( Lemma IA.3I one can easily increase D to avoid this. 



POLYNOMIAL PROGRESSIONS IN PRIMES 



35 



for j = 0,1,2. Let us just demonstrate this in the hardest case j = 2, as it will 
be clear from the proof that the same argument also works for j = 0,1 (as they 
involve fewer factors of v) . We expand the left-hand side as 



K G [H] DCj m(0) ,m(i) e [^/M] A i ^m,m'£[M] 



X 



,A l -"> ,W) 



V) 



uj£{0,1} a i 

J"J J"J T Qo(h,m^\W)+F(. a (m~J2 aeAl m a ,h,W) ^Q (ft,m ( "' , W)+R a (m'-VJ QeAj m a ,h,W) 

ue{o,i}"! aeA nl 

One can then invoke the polynomial forms condition (Definition 13. 6fl one last time 
(again taking r/i small enough) to verify that this is indeed 1 + 05(1). Note that as 
every node in A n i is non-linear, the polynomials R a have degree at least 2, which 
ensures that the polynomials used to shift v here are all distinct. This concludes 
the proof of Proposition 15 . 91 in the linear case, and hence in general, and Theorem 
IO follows. ■ 

Remark 5.19. One can define polynomial systems and weights (Definitions 15.11 
I5.12J1 for systems of multivariable polynomials R a G Z[mi, . . . , m r , hi, . . . , hp, W] 
(see for example [22)- Following the steps of the PET induction ((5. 10(1 and par- 
allelopipedization (|5.16|) one can prove a multivariable version of the polynomial 
generalized von Neumann theorem (Theorem 14.5(1 . 



6. Polynomial dual functions 



This section and the next will be devoted to the proof of the structure theorem, 
Theorem l4.7l In these sections we shall assume the notation of Section|21 and fix the 
bounded quantities t > 0, d > 2 and Q € Z[hi, . . . , h t , W] d . As they are bounded 
we may permit all implicit constants in the o() and 0{) notation to depend on these 
quantities. We also fix the pseudorandom measure v. We shall abbreviate 

11/11(7 : = ll/llyQd-HJ'.W)- 

Vm 

Roughly speaking, the objective here is to split any non-negative function bounded 
pointwise by v to a non-negative function bounded pointwise by 1, plus an error 
which is small in the |j ||;y norm. For technical reasons (as in JSj) we will also need 
to exclude a small exceptional set of measure o(l), of which more will be said later. 

Following [T8|, our primary tool for understanding the U norm shall be via the 
concept of a dual function of a function / associated to this norm. 

Definition 6.1 (Dual function). If / : X — > R is a function, we define the dual 
function T>f : X — > R by the formula 

T>f — F, ^ TT rE.eMi™!"' 1 -™!"')*!''^)/ 

U J ■- ^e[ff]*^m(°),m( 1 )e[v / M] d 11 J- 

(w 1 ,...,w d )G{0,l} <i \{0} d 

where mfi = {m(\ . . . ,ntp) for i = 0, 1. 
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From 1|24[) and the translation invariance of the integral J x we obtain the funda- 
mental relationship 

(40) = / fVf. 

Jx 

Thus we have a basic dichotomy: either / has small U norm, or else it correlates 
with its own dual function 27 . As in |18| . it is the iteration of this dichotomy via a 
stopping time argument which shall power the proof of Theorem 14. 71 

For future reference we observe the trivial but useful facts that T> is monotone and 
homogeneous of degree 2 d — 1: 

(41) l/l < g pointwisc =>• |2?/| < T>g pointwise; 

(42) V(Xf) = A 2 "" 1 !?/ for all A S R. 

We will need two key facts about dual functions, both of which follow primarily 
from the polynomial correlation condition. The first, which is fairly easy, is that 
dual functions are essentially bounded. 

Proposition 6.2 (Essential boundedness of dual functions). Let f : X — + R obey 
the pointwise bound |/| < v + 1. Then for any integer K > 1 we have the moment 
estimates 

(43) / \Df\ K {v + l)<2(2 2d - 1 ) K + o K {\). 
Jx 

In particular, if we define the global bad set 

(44) fl :={xeX : Vv{x) > 2 2d } 
then we have the measure bound 

(45) / (Vv) K ln {v + l) = o K (l) 

Jx 

for all K > 0, and the pointwise bound 

(46) |P/|(l-l O0 )<2 2d . 

Remark 6.3. In |18j . the correlation conditions imposed on v were strong enough 
that one could bound the dual function Vf uniformly by 2 2 _1 + o(l), thus re- 
moving the need for a global bad set Qq. One could do something similar here by 
strengthening the correlation condition. However, we were then unable to establish 
Theorem 13. 181 i.e. we were unable to construct a measure concentrated on almost 
primes which obeyed this stronger correlation condition. The basic difficulty is 
that the polynomials in Q could contain a number of common factors which could 
significantly distort functions such as T>v at some rare points (such as the origin). 
Fortunately, the presence of a small global bad set does not significantly impact our 
analysis (similarly to how sets of measure zero have no impact on ergodic theory) , 

27 In the language of infinitary ergodic theory, it will be the dual functions which generate 
(in the measure-theoretic sense) the characteristic factor for the U norm. The key points will be 
that the dual functions are essentially bounded, and that v — 1 is essentially orthogonal to the 
characteristic factor. 
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especially given how it does not depend on /. In practice, K will get as large as 
1/776, but no greater. 



Proof. We begin with |g5J|. By J3TJ, H2J it suffices to show that 

K 



V- 



v + l 



1 + 0^(1); 



in view of Lemma 13.141 it suffices to show that 

{Vv) K v = l + o K (l). 



x 



The left-hand side can be expanded as 



),...M K) e[HY 



x 



k£[K] 



m(°),m( 1 )G[v / M] d 



("1 ^)e{o,i} d \{o} d 

But this is 1 + o A '(l) from {TSJ (with D = d, D' = 0, D" = Kt, and L = 1, with 
the and 5/ vanishing). This proves l|43|) . From Chebyshev's inequality this 
implies that 

2(2 2d - 1 ) K ' 



{Vv) K 'l no (v+l)< 



x 



2 K 



o K (l) 



for any < K' < K. For fixed K', the right-hand side can be made arbitrarily 
small by taking K large, and then choosing N large depending on K; thus the 
left-hand side is o(l), which is 103) ■ Finally, follows from (glj and (01} . □ 



The global bad set Qq is somewhat annoying to deal with. Let us remove it by 
defining the modified dual function Df of / as 

Vf:= (l-l no )£>/. 
Then Proposition 16.21 and l|40() immediately imply 

Corollary 6.4 (Boundedness of modified dual function). Let f : X — > R obey the 
pointwise bound \ f\ < v + 1. Then Df takes values in the interval 

(47) I := [-2 2d ,2 2 "}. 
Furthermore we have the correlation property 

(48) / /p/=ii/n 2 ;+ (i). 

Jx 



The second important estimate is easy to state, although non-trivial to prove: 

Proposition 6.5 (y — 1 orthogonal to products of modified dual functions). Let 
1 < K < I/776 be an integer, and let fx, . . . , fx '■ X — * R be functions with the 
pointwise bounds \fk\ < v + 1 for all k G [K] . Then 

(49) / Vfx...Vf K (v-l) = o{l). 

J x 
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Remark 6.6. Note that (|43|l already gives an upper bound of O(l) for (|49|l : the 
whole point is thus to extract enough cancellation from the v — 1 factor to upgrade 
this bound to o(l). 



The rest of the section is devoted to the proof of Proposition 16.51 The argument 
follows that of ^0 Section 6], and is based a large number of applications of the 
Cauchy-Schwarz inequality, and the polynomial correlation condition (Definition 
13.9(1 . The arguments here are not used again elsewhere in this paper, and so the 
rest of this section may be read independently of the remainder of the paper. 

We begin with a very simple reduction: from l|45|l we can replace the modified dual 
functions T>fi with their unmodified counterparts T>fi- Our task is then to show 

(50) / Vf 1 ...Vf K (v-l) = o(l). 

Jx 



6.7. A model example. Before we prove Proposition 16 . 51 in general, it is instruc- 
tive to work with a simple example to illustrate the idea. Let us take an oversim- 
plified toy model of the dual function T>f, namely 

Vf ■= E he[H]~E< me[ ^M ] T mh f- 

This does not quite correspond to a local Gowers norm 28 , but will serve as an 
illustrative model nonetheless. Pick functions /i, . . . , Jk with the pointwise bound 
< v for k S [K] and consider the task of showing l|5U|l . We expand the left-hand 
side as 

(51) E^...,^^^^ f T™^f 1 ...T™ h «f K (v-l). 

j x 

Note that we cannot simply take absolute values and apply the pseudorandomness 
conditions, as these will give bounds of the form O(l) rather than o(l). One 
could instead attempt to apply the Cauchy-Schwarz inequality many times (as 
in the previous section), however the fact that K = 0(l/ry 2 ) could be very large 
compared to the pseudorandomness parameter I/771 defeats a naive implementation 
of this idea. Instead, we must perform a change of variables to introduce two new 
parameters n^°\n^ to average over (and which only requires a single Cauchy- 
Schwarz to estimate) rather than K parameters (which would essentially require K 
applications of Cauchy-Schwarz). 

More precisely, we introduce two slightly less coarse-scale parameters nP^ 6 
[M 1 / 4 ] than m\, . . . ,mic- Define the multipliers hk '■= Y\k'e[K]\k hk' , thus hk = 
0{H K ~ X ), which is small compared to M 1 / 4 by the relative sizes of rj 7 , r/ e , r] 2 - Shift- 
ing each of the rrik by hk{n^ —n^) and using (|16|l . we conclude that 1(5 1|) is equal 
to 



E hi,...,hKelH]^ mit ... >mKe [^M] 



x 



J-J T (rn k +h k (^ 1} -n m ))h k ^ 
ke[K] 



(v-l) + o K (l) 



28 However, the slight variant E hg [^jE m m , g [ v /]g-]T'( m m ) h / does correspond to a (very sim- 
ple) local Gowers norm, with t = d = 1 and Q = (hi). 
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for all n(°',nP'' € [M 1 / 4 ]. Averaging over all n^ ',^ 1 ' and shifting the integral by 

n (0) /ii ...h K = /iin (0) /ii = . . . = h K n {0) h K 
we can thus write <|51l) as 

E / ll ,...,/7.A-e[H] E m 1 ,...,m A .e[v / M] E n<°),n< 1 )e[Mi/4] 

rpn ( - 1) h 1 ...h K ^2>m 1 h 1 rpm K h K j K YT h 1 ...h K _ j\ _|_ q^/^ 



A" 



which we may factorize as 



E 



h x ,...,h K £[H] 



A 



E n(i)G[Mi/4] I| E mG [VM] T '' 



fee [if] 



E n(0)e[M1/4] T" (0) ^- h -(^-l)] + *(1). 
By Cauchy-Schwarz it thus suffices to show that 



E 



hi,...,h K &[H] 



X 



nr rpn { > h 1 ...h k +mh k f 

fe£[K] 



Ok (I) 



and 



E 



hi,...,/i K 6[fl] 



A 



E 



n(°>G[M 



r^T"^-^-!)] = 0jf (l). 



To prove the first estimate, we estimate by f and expand out the square to 
reduce to showing 

n n E me[VM] T^- h ^ h ^=o K (i), 

»=i fee [if] 

but this follows from the correlation condition (|18|l (for r/i small enough 29 ). To 
prove the second estimate, we again expand out the square and reduce to showing 



E /ii,...,?iKG[-H"] E n( 1 ),™( 2 )e[Af 1 / 



X 



E 



hi,...,h K e[H] 



x 



E 



„<0) e [ M l/4] 



1/41 J 



= l + o K (l). 



for j = 0, 1, 2, which will again follow from i|18fl for r/i small enough. 



6.8. Conclusion of argument. Now we prove (|5U|) in the general case. We may 
take d > 1, since the d = case follows from H15fl . We expand the left-hand side as 



E £ E m 



IT JT T E ieM (™S ) -™^)Q.(^ fc, .M / )/ fc 
fee[-ff] we{04} d \{o} d 



where u) = (u>x, . . . , ojd) we use the abbreviations 
E £ := E h( 1 \...,h'- K) e[H] t 



2') 



It is important to note however that rji does not have to be small relative to K or to 
parameters such as r\i . 
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We introduce moderately coarse-scale parameters nf \ e [M 1 / 4 ] for i € [d] and 
the multipliers 

h k ,i = hk,i(h,W):= ]J Q t {h {k '\w). 
k'e[K]\{k} 

Observe that hk,i = 0(H°^ K ^), which will be much smaller than M 1 / 4 by the 
relative sizes of 777, rj 6 , 772. Shifting each mf'l by hk^in'p — n^) and using (fTB|> . we 
can then rewrite l|5U|) as 

% e »->/ n n t^m^-^^.^ 

x ke[K] ^e{o,i} d \{o} d 

for any nf\ . . . , rS®\ , ■ ■ . , n} <E [M 1 / 4 ]. Now from construction we have 
hk,iQi(h( k \W) = bi, where 

bi = bi(h,W) := [] QS {k \W) 
ke[K] 

(note that bi 7^ by hypothesis on Q) and on averaging over the n variables, we 
can write the left-hand side of (|50|l as 

/ n ^i^-^x^+o^i) 

x ue{os} d 

where — (1% , . . . , n] ) will be understood to range over [Af 1 / 4 ] d for i = 0, 1, 

fe£[K] 

for w e {0,l} d \{0} d , and 



■9{0} d ,K,m :— ^ 1 ' 

Shifting the integral by T^ i6 M ™ ; 1 we can rewrite this as 
x we{o,i} d 

Now use the Cauchy-Schwarz-Gowers inequality i|99|) to obtain the pointwise esti- 
mate 

iE SC o), fiC1) n r^^s^j 

wG{0,l} d 

< n n ^ n ^ b -^^ i/2d - 

w'e{o,i} d we{o,i} d 

By Holder's inequality, we thus see that to prove (|50[1 it suffices to show that the 
quantity 



-ft 



( 52 ) %,™ / E^o), s(1 , J] TT - 

x we{o,i} d 

is K (1) when J G {0, l} d \{0} d and is o K (l) when a/ = d 



^u;' ,ft,m 
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Let us first deal with the case when u>' ^ d . Our task is to show that 

X w£{0,l} d ke[K] 

We can bound fk pointwise by v, and factorise the left-hand side as 

n E ^_ TT T T. iew (mf'* ) -mf ) )Qi{h^\W)+ n f i) b i 

fee [if] wG{0,l} d 

But this is 1 + ok(1) = Oa'(1) by (|3.9|) with L = (here we use the fact that the 
bi are non-zero polynomials of h and W) . Here we need rji to be sufficiently small 
depending on t, d, Q but not on K. 

Finally, we have to deal with the case u> = d . Since q . ? -, = v — 1 and bi — 
bi(h,W) are independent of W, we can rewrite (|52() as 

n T^^^H^-D 

x we{0,l}<* 

and so by the binomial formula it suffices to show that 

as,*-,,**, / n^^H=i + «) 

for all A C {0, But this follows from the polynomial correlation condition 
(|18fl (with K — 0), again taking r\\ sufficiently small depending on t,d,Q. This 
concludes the proof of l|5U|l . and hence Proposition 

7. Proof of structure theorem 

We can now complete the proof of the structure theorem by using the arguments 
of |181 §7-8] more or less verbatim. In fact these arguments can be abstracted as 
follows. 

Theorem 7.1 (Abstract structure theorem). Let I be an interval bounded by 0(1). 
Let v : X — > R + be any measure, and let f i— ► T>f be a (nonlinear) operator obeying 
the following properties: 

• If we have the pointwise bound \f\<u+l, then T>f : X — > I takes values 
in I, in particular 

(53) Vf = 0(l). 

• If ' 1 < K < 1/rje, and fx, . .., : X — > R are functions with the pointwise 
bound < z/ + 1 /or a// fc G [if], i/ien Wds. 

T/ien /or any 5 : X — * R + wif/i £/ie pointwise bound Q < g < v , there exist functions 
9u ± j9u '■ X — ► R obeying the estimates (I26II . (|27l) . (|28|) . and 

(54) I / ffj/Dffal < 
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Indeed, Theorem 14.71 immediately follows by applying Theorem 17. II with (|29|l fol- 
lowing from and igHJ ■ 

In the remainder of this section we prove Theorem 17.11 Henceforth we fix l,v,T> 
obeying the hypotheses of the theorem. 

7.2. Factors. As in ^B] we shall recall the very useful notion of a factor from 
crgodic theory, though for our applications we actually only need the finitary version 
of this concept. 

Let us set X to be the probability space X = (X, Bx , ) , where X = Zn , Bx = 2 X 
is the power set of X, and nx is the uniform probability measure on X. We define 
a, factor 30 to a quadruple Y = (Y, By , , Try) , where (Y,By,Hy) is a probability 
space (thus By is a er-algebra on Y and fly is a probability measure on By) together 
with a measurable map n : X — > Y such that {~ky)*^x = A*y, or in other words 
fixity 1 {E)) = fiy(E) for all E £ i3y. The factor map 7ry induces the pullback 
map tt* y : L 2 (Y) -> L«(X) and its adjoint (Try)* : L 2 (X) -> L 2 (Y), where L 2 (X) 
is the usual Lebesgue space of square-integrable functions on X. We refer to the 
projection Try (Try)* : i 2 (X) — > L 2 (X) as the conditional expectation operator, and 
denote TTy(7ry)*(/) by E(/|Y); this is a linear self-adjoint orthogonal projection 
from i 2 (X) to 7r y i 2 (Y). 

The conditional expectation operator is in fact completely determined by the a- 
algebra TTy (By) C Bx- As X is finite (with every point having positive measure), 
7Ty (By) is generated by a partition of X into atoms (which by abuse of notation 
we refer to as atoms of the factor Y), and the conditional expectation is given 
explicitly by the formula 

E(/|Y)(z) = E yeB{x) f(y) 

where B(x) is the unique atom of it~ 1 (By) which contains x. We refer to the 
number of atoms of Y as the complexity of the factor Y. By abuse of notation we 
say that a function / : X — > R is measurable with respect to Y if it is measurable 
with respect to 7Ty 1 (Sy), or equivalently if it is constant on all atoms of Y. Thus 
for instance (iry)* L q (Y) consists of the functions in L q (X.) which are measurable 
with respect to Y. 

If Y = (Y, By, fiy, Try) and Y' = (Y 7 , By , [ly, Try) are two factors, we may form 
their join YVY' = (7x1", By X By , My x My > Try © ^y ) in the obvious manner; 
note that the atoms of Y V Y' are simply the non-empty intersections of atoms of 
Y with atoms of Y', and so any function which is measurable with respect to Y or 
Y' is automatically measurable with respect to Y V Y'. 



In infinitary ergodic theory one also requires the probability spaces X, Y to be invariant 
under the shift T, and for the factor map 7r to respect the shift. In the finitary setting it is 
unrealistic to demand these shift-invariances, for if TV were prime then this would mean that there 
were no non-trivial factors whatsoever. While there are concepts of "approximate shift-invariance" 
which can be used as a substitute, see 1341 . we will fortunately not need to use them here, as the 
remainder of the argument does not even involve the shift T at all. 
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Note that any function / : X — > R automatically generates a factor (R, Br, f*[ix, f), 
where Br is the Borel a- algebra, which is the minimal factor with respect to which 
/ is (Borel-) measurable. In our Unitary setting it turns out we need a discretized 
version of this construction, which we give as follows. 

Proposition 7.3 (Each function generates a factor). For any function G : X — > I 
there exists a factor Y(G) with the following properties: 

• (G lies in its own factor) For any factor Y', we have 

(55) G = E(G|Y(G) V Y') + 0(r)\). 

• (Bounded complexity) Y(G) has at most 0^(1) atoms. 

• (Approximation by continuous functions of G) If A is any atom in Y(G), 
then there exists a polynomial : R — > R of degree 0^ 5 (1) and coefficients 
O^l) such that 

(56) V A (x) S [0, 1] for allxel 
and 

(57) / \l A -* A (G)\(v + l) = 0(r) 5 ). 

Proof. This is essentially ^3 Proposition 7.2], but we shall give a complete proof 
here for the convenience of the reader. 

We use the probabilistic method. Let a be a real number in the interval [0,1], 
chosen at random. We then define the factor 

Y(G) := (R,B r ^ a ,G^x,G) 

where B v 2 a is the cr-algebra on the real line R generated by the intervals [(n + 
a) 174, (n + a + l)r/|) for neZ. This is clearly a factor of X, with atoms A n ^ a := 
G _1 ([(n + a)i]l, (n + a + 1)t?|))- Since G ranges in /, and we allow constants to 
depend on /, it is clear that there are at most 0^ 4 (1) non-empty atoms and that 
G fluctuates by at most 0{j]\) on each atom, which yields the first two desired 
properties. It remains to verify that with positive probability, the approximation 
by continuous functions property holds for all atoms A n ^ a . By the union bound, 
it suffices to show that each individual atom A n ^ a has the approximation property 
with probability 1 — 0(775). 

By the Weierstrass approximation theorem, we can find for each a a polynomial 
^ A n a obeying (|56|l which is equal to 1 [(n+a)^ .(n+a+i)r?J) + 0(6) outside of the set 

E n<a ■= [(n + a - 775)774, (n + a + 775)77!] U [(n + a + 1 - 771)774, (n + a + 1 + 775)774]. 

Simple compactness arguments allow us to take 'I'^ a to have degree 0^(1) and 
coefficients % (1). Since 

l A„, a = l [{n+ a )ril,(n+a+l) ri l){G), 

we thus conclude (from (|15(l 'l that 

/ \l A -* An JG)\{v + l) < 0(775 ) + 0{( l En JG)(v + l)). 
Jx Jx 
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By Markov's inequality, it thus suffices to show that 

[\[ lE n JG)(u+l)} da = 0( V 2 5 ). 
Jo Jx 

But this follows from Fubini's theorem, (|15|l . and the elementary pointwise estimate 

[\ En . a (G) da = 0(r&). 
Jo 

□ 

Henceforth we set Y(G) to be the factor given by the above proposition. A key 
consequence of the hypotheses of Theorem l7.1l is that — 1 is well distributed with 
respect to any finite combination of these factors: 

Proposition 7.4 (v uniformly distributed wrt dual function factors). Let K > 1 
be an integer with K = O m (l), and let fx, ... , fx ■ X — > R be functions with the 
pointwise bounds \ f k \ < v + 1 for all 1 < k < K. Let Y := Y(X>/i) V . . ,VY(%). 
Then we have 

(58) Vf k =V(Vf k \Y) + 0(r,l) 

for all k £ [K], and we have a Y -measurable set Q C X obeying the smallness 
bound 

(59) / ln(y+l) = O m ( n \ /2 ) 

Jx 

and we have the pointwise bound 

(60) |(l-ln)E(^-l|Y)|<0 ))4 (^ /2 ). 



Proof. We repeat the arguments from ^1 Proposition 7.3]. The claim ff 5 81) follows 
immediately from l(55|) . so we turn to the other two properties. Since each Y(Vfk) 
is generated by V4 (1) atoms, Y is generated by 0^ 4j ^(l) = O m (l) atoms. Call 

1 /2 

an atom A of Y small if f x \a(v + 1) < V5 > an d let be the union of all the 
small atoms, then Q is clearly Y-measurable and obeys (|59|l . It remains to prove 
or equivalently that 

f x l A (v-l) _ _ 1/2 



Ix^A 

for all non-small atoms A. 



V yeA v(y)-l = O ril (4' )+o(l) 



Fix A. Since A is not small, we have 



/ l A (v - 1) + 2 / l A = [ \ A (v + 1) > nl /2 
Jx Jx Jx 



Hence it will suffice to show that 

l A [v-l) = Vi (ri 5 )+o(l). 



x 



On the other hand, since A is the intersection of atoms A\, . . . , Ak from Y(2?/i), 
. . ., Y(X>/k), we see from Proposition 17.31 and an easy induction argument that 
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there exists a polynomial ^ : H K — > R of degree V5 (l) and coefficients 0^(1) 
which maps I K into [0, 1] such that 



\1 A - *(Dfi, ■ ■ .,T>f K )\(v + 1) = OnM- 



x 



In particular 



/ (1a - • ■ -,T>f K )){u - 1) = O m ( m ) 

Jx 



On the other hand, by decomposing ^ into monomials and using l|49|l (assuming 
776 sufficiently small depending on 775) we have 

f *(©/!,...,%)(!/- 1) = 0(1) 
JX 

and the claim follows (we can absorb the o(l) error by taking N large enough). □ 



7.5. The inductive step. The proof of the abstract structure theorem proceeds 
by a stopping time argument. To clarify this argument we introduce a somewhat 
artificial definition. 

Definition 7.6 (Structured factor). A structured factor is a tuple Yjj = (Yx, 2*T, 
.Fi, . . ., Fx, Qk)i where K > is an integer, F l7 . . . , Fx : X — > R are functions 
with the pointwise bound \F^\ < v + 1 for all k £ [K], Y x is the factor Y x '■= 
Yj{(Fi) V ... V Yr (Fx), and ^a' C X is a Y/f-measurable set. We refer to K as 
the order of the structured factor, and Qx as the exceptional set. We say that the 
structured factor has noise level a for some a > if we have the smallness bound 

/ W^+I)<<7 

and the pointwise bound 

(61) |(l-ln*)E(i/-l|YK:)|<a. 

If g : X — > R is the function in Theorem 17. II we define the energy £ g (Yx) of the 
structured factor Y relative to g to be the quantity 

£ g (Y K ):= f (l-ln K )E(g\Y K ) 2 . 
Jx 



If Y x has noise level a < 1, then since g is bounded in magnitude by i', then 
observe that 

(62) |(1-1o k )E( 9 |Yr)| < (l-ln K )(E(z/-l|Yx) + l) < l + a<2 
and so we conclude the energy bound 

(63) < £ g (Y x) < 4. 

This will allow us to apply an energy increment argument to obtain Theorem 17. II 
More precisely, Theorem 17. II is obtained from the following inductive step. 

Proposition 7.7 (Inductive step). Let Y x = (Yjf, K, F\, . . . , Fx, Qk) be a struc- 
tured factor of order K and noise level < a < nf . Then, if we 

(64) Fjr+i:= 1 (i_i n )(0_E(0|Y)) 

1 + er 
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and we suppose that 

(65) | / F K+l VF K+1 \ > m 

Jx 

then there exists a structured factor Y k+i = (Yk+i, K+l, F±, . . . , Fk, Fk+i, Qk+i) 

i li 

of order K+l and noise level a + O m (ri 5 ) with the energy increment property 

(66) £ g (Y K+1 )>£ g (Y K )+ cri l 
for some constant c > (depending only on I). 

Let us assume Proposition 17.71 for the moment and deduce Theorem 17. II Starting 
with a trivial structured factor Yo of order and quality and iterating Proposition 
I7.7l repeatedlv (and using (|63|l to prevent the iteration for proceeding for more than 
4/c7?! = V4 (1) steps), we may find a structured factor Yk of order K — V4 (1) 
and noise level 

(67) o = O m {nl /2 ) < 4 

such that the function Fk+i defined in l|64|) obeys the bound 

| / F K+l VF K+l \ < m , 
Jx 

If we thus set gu :— Fk+i and 

9u± :=-^(l-l^)E(.g|Y) 

1 + cr 

then we easily verify Ip3|) and (|54~|) . while (|?7l) follows from JMJ, since E(g|Y) < 
1 + E(z/ — 1|Y). To prove l|28(l . we see from (|67|l that it suffices to show that 

/ (1 - l nx )E(ff|Y) = / g-0, u {vl' 2 )- 
Jx Jx 

Since fix is Y-measurable, the left-hand side is J x g — j x ln K 9- But the claim 
then follows from (|61|l and (|67() . This proves Theorem 17. II 

It remains to prove Proposition O Set Y K+1 := Y V Y(VF K +i) = Y{VFi) V 
... V Y(DFk+i)- Now from Proposition 17 . 41 we can find a Yj<- + i-measurable set f2 
obeying the smallness bound l|59[l and the pointwise bound 

(68) |(1 - l n )E(* - 1|Y X+1 )| < O m (vl /2 )- 

Now set '■= U ft. This is still Yx+i-measurable an( j J fi J{+1 < a + 

Vi (ril^ 2 ); from ffify we thus conclude that Yk+i has noise level a + V4 (ril^ 2 ). 
Thus the only thing left to verify is the energy increment property H66J1 . 

From (H2J, (jnnj we have 

(69) | / (1 - ln K )(g - E(g\Y K ))VF K+1 \ > m - 0( V 2 4 ). 

Jx 
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Now from l|53|) . the pointwise bound < g < v, (|62(l . and (|59|) we have 

I / (ln K+1 - la K )(g - E(g\Y K ))VF K+1 \ < 0{ [ (l 0jc+1 - l 0jf + 1)) 
Jx Jx 

and hence by (|69|l 

I / (l-lo K+1 )(5-E(g|Y A -))PF A - +1 | > 774-0(772). 
Next, from (J5SJ, the pointwise bound < g < ^ and l|15|l we have 
I f (1 - ln K+ J(g -V(9\Y k))(T>F k+1 -E(T>F K+1 \Y K+1 )\ < f (u + 1)0(t7 4 2 ) 

= 0(77 4 2 ) 

and thus 

I / (1 - ln K+1 )(g - E(g\Y K ))E(DF K+1 \Y K+l )\ > m - 0( V j). 
Jx 

Since Qk+i, E((7|Y a ), and E(T>Fk+i \Yk+i) are already Y^+i-measurable, we 
conclude 

I / (1 - ln K+1 )(V(g\Y K+1 ) - E(g\Y K ))V(DF K+1 \Y K+1 )\ > 774 - 0{ V l). 
Jx 

By Ij53(l and Cauchy-Schwarz we conclude that 

(70) / (1 - ln K+1 )|E(5|Y K+1 ) - E(g\Y K )\ 2 > 2cq\ - 0( V 3 4 ) 

Jx 

for some c > 0. 

To pass from this to (|66|) . first observe from 116211 and l|59|l that 

/ (ln K+1 -ln K Mg\Y K f = Vi ( V l /2 ) 
Jx 

and so by the triangle inequality and (|63|) . l|(jtj|) will follow from the estimate 

/ (l-l nif+1 )E( 5 |Y A+1 ) 2 > f (1-1 0k+1 )E( 5 |Y a ) 2 + 2 C 7 /4 2 - 0(77 4 3 ). 
Jx Jx 

Using the identity 

E( 5 |Y A+1 ) 2 = E(.g|Y A ) 2 +|E( ff |Y A+1 )-E(.g|Y A )| 2 +2E(.g|Y A )(E(.g|Y A+1 )-E(.g|Y A )) 
and H7()|l . it will suffice to show that 

/ (1 - lo A . +1 )E( 5 |Y A -)(E(.g|Y A+1 ) - E(g\Y K )) = 0(4). 
Jx 

Now observe that E(g|Y/^ + i) — E^Y^-) is orthogonal to all Y A -measurable func- 
tions, and in particular 



f (1 - lo A .)E(.g|Y A )(E( ff |Y A+1 ) - E(g\Y K )) = 0. 
Jx 

to show that 

/ (ln K+1 - la K Mg\Y K )(B(g\Y K+1 ) - E(g\Y K )) = 0(r,l). 
Jx 



I X 

Thus it suffices to show that 
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Since everything here is Yif+i-measurable, we may replace E(<?|Yi£-+i) by g. Using 
(|62|l it then suffices to show 



But this follows from the pointwise bound < g < from 1)62(1 . and 1(59(1 . This 
concludes the proof of Proposition l7.7l which in turn implies Theorem l7.1l and thus 
Theorem 



8. A PSEUDORANDOM MEASURE WHICH MAJORIZES THE PRIMES 

In the remainder of the paper we prove Theorem l3.18l which constructs the pseudo- 
random measure v which will pointwise dominate the function / defined in ((11(1 . As 
in all previous sections we are using the notation from Section[3]to define quantities 
such as W, R, M, b. 

The measure v can in fact be described explicitly, following |32], E3> ED]- Let 
X ■ R — > R is a fixed smooth even function which vanishes outside of the interval 
[— 1, 1] and obeys the normalization 



for x € [N], where the sum is over all positive integers m which divide Wx + b, 
and n(m) is the Mobius function of m, defined as (— l) k when m is the product of 
k distinct primes for some k > 0, and zero otherwise (i.e. zero when m is divisible 
by a non-trivial square). 

Remark 8.1. The definition of v may seem rather complicated, but its behavior is in 
fact rather easily controlled, at least at "coarse scales" (averaging x over intervals 
of length greater than a large power of R), by sieve theory techniques, and in 
particular by a method of Goldston and Yildirim ^2], though in the paper here 
we exploit the smoothness of the cutoff x ( as i n EH- 03 ■ EH) to avoid the need 
for multiple contour integration, relying on the somewhat simpler Fourier integral 
expansion instead. For instance, at such scales it is known from these methods that 
the average value of v is 1 + o(l) (see e.g. E3)j an( A more generally a large 
family of linear correlations of v with itself are also 1 + o(l) (see ^S], [201) ■ Thus 

■^This differs slightly from the majorant introduced by Goldston and Yildirim 1141 and used 
in )18| : in our current notation, the majorants from those papers corresponds to the case xM •= 
max(l — |t|, 0). It turns out that choosing \ to be smooth allows for some technical simplifications, 
at the (acceptable) cost of lowering r/2 = \° B ^ slightly. 




(71) 




but is otherwise arbitrary 31 . We then define v by the formula 



(72) 
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one can view v as being close to 1 in a weak (averaged) sense, though of course in 
a pointwise sense v will fluctuate tremendously. 



It is easy to verify the pointwise bound f(x) < v(x). Indeed, from 1)72(1 it 

suffices to verify that 



m\ Wx+b 

whenever x <E [hN] and Wx + b E A. But this is clear since Wx + b is prime and 
greater than R. It is also easy to verify the bound 1(16(1 . using the elementary result 
that the number of divisors of an integer n is O e (n e ) for any e > 0. 

The remaining task is to verify that v obeys both the polynomial forms condition 
(|17|l and the polynomial correlation condition (note that (|15|l follows from 
(|r7jl L We can of course take N to be large compared with the parameters 770 , ... , 777 
and with the parameters D' , D" , K, e (in the case of (dJl) as the claim is trivial 
otherwise. 



We begin with a minor reduction designed to eliminate the "wraparound" effects 
caused by working in the cyclic group X = Z/NZ rather than the interval [N]. Let 
us define the truncated domain X' to be the interval X' := {x e Z : \/N < x < 
N — VN} (say). From 1)16(1 we can replace the average in X with the average in X 1 
in both (|17[) and (|18[) while only incurring an error of o(l) or oo',d",k (1) at worst. 
The point of restricting to X' is that all the shifts which occur in (|17|l and 1|18[) 
have size at most 0(Af ° (1/, ' l) ) or D >, D >^ K (M because of the hypotheses 
on the degree and coefficients of the polynomials and because all convex bodies are 
contained in a ball B(0, M 2 ). By choice of M, these shifts are thus less than \HV 
and so we do not encounter any wraparound issues. Thus 1(17(1 is now equivalent to 



(73) E RennZd E xeX , [] u(x + Qxih)) = 1 + 0,(1) 

and 118|) is similarly equivalent to 



(74) 



E nefi'nz°' E hen"nz D " E ^6^' 



]J E mei!nz D Yi v{x + Pj{h) -m + Qj^h) -n) 
ke[K] je[J] 

= 1 + OD'.D",_ftr,e(l) 



]J v(x + Si(h) ■ ft) 
ie[L] 



where v is now viewed as a function on the integers rather than on X = Z/NZ, 
defined by (j72|l . 

We shall prove 1(73(1 and ((74|) in Section ^] and Section ^| respectively. Before we 
do so, let us first discuss what would happen if we tried to generalize these averages 
by considering the more general expression 



(75) 
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where D,J>0 are integers, f2 is a convex body in H D , and Pi, . . . , Pj £ Z[xi, . . . , xu] 
are polynomials of bounded degree and whose coefficients are not too large (say of 
size 0(W°^)). In light of the linear correlation theory, one would generally ex- 
pect these polynomial correlations to also be 1 + o(l) as long as the polynomials 
Pi, . . . ,Pj were suitably "distinct" and that the range f2 is suitably large. 

There will however be some technical issues in establishing such a statement. For 
sake of exposition let us just discuss the case J = 1, so that we are averaging a 
single factor v(P(x)) for some polynomial P of D variables x = (x\, . . . , xd)- Even 
in this simple case, two basic problems arise. 

The first problem is that v is not perfectly uniformly distributed modulo p for all 
primes p. The 'W-trick" of using Wx + b instead of x in l|72[> (and renormalizing 
by to compensate) does guarantee a satisfactory uniform distribution of v 

modulo p for small primes p < w. However for larger primes p > w, it turns out 
that v will generally avoid the residue class {x : Wx + b = mod p] and instead 
distribute itself uniformly among the other p — 1 residue classes. This corresponds 
to the basic fact that primes (and almost primes) are mostly coprime to any given 
modulus p. Because of this, the expected value of an expression such as v(P(x)) will 
increase from 1 to roughly (1 — -) _1 if we know that WP(x) + b is coprime to p, and 
conversely it will drop to essentially zero if we know that WP(x) + b is divisible by p. 
These two effects will essentially balance each other out, provided that the algebraic 
variety {x £ : WP(x) + 6 = 0} has the expected density of | + 0(^/2 ) (say) 

over the finite field affine space F® . The famous result of Deligne [7] , |S] , in which 
the Weil conjectures were proved, establishes this when WP + b was non-constant 
and is absolutely irreducible modulo p (i.e. irreducible over the algebraic closure of 
F p ). However, there can be some "bad" primes p > w for which this irreducibility 
fails; a particularly "terrible" case arises when p divides the polynomial WP + 1 , 
in which case the variety has density 1 in F® and the expected value of (|75|l drops 
to zero. This reflects the intuitive fact that WP(x) + 1 is much less likely to be 
prime or almost prime if WP + 1 itself is divisible by some prime p. The other 
bad primes p do not cause such a severe change in the expectation Ij75(l . but can 
modify the expected answer of 1 + o(l) by a factor of 1 + <3(±) = exp(0(±)), 
leading to a final value which is something like exp(0(^ p ^ a( j |) + o(l)). In most 
cases this expression will be in fact very close to one, because of the restriction 
p> w. However, the (very slow) divergence of the sum J2 P \~ means that there are 
some exceptional cases in which averages such as (|75ll are unpleasantly large. For 
instance, for any fixed h ^ 0, the average value of v(x)v(x + h) over sufficiently 
coarse scales turns out to be exp(0(^ p>liI . p | /l i) + o(l)), which can be arbitrarily 
large in the (very rare) case that h contains many prime factors larger than w, the 
basic problem being that the algebraic variety {a; £ F p : (Wx+b)(W(x+h)+b) — 0}, 
which is normally empty, becomes unexpectedly large when p > w and p divides h. 
This phenomenon was already present in ^H], leading in particular to the rather 
technical "correlation condition" for v. 

The second problem, which is a new feature in the polynomial case compared with 
the previous linear theory, is that we will not necessarily be able to average all 
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of the parameters x%, . . . ,xd over coarse scales (e.g. at scales 0(M), 0(yM) or 
0{M 1 l A ). Instead, some of the parameters will be only averaged over fine scales such 
as O(H). At these scales, the elementary sieve theory methods we are employing 
cannot estimate the expression (|75H directly; indeed, the problem then becomes 
analogous to that of understanding the distribution of primes in short intervals, 
which is notoriously difficult. Fortunately, we can proceed by first fixing the fine- 
scale parameters and using the sieve theory methods to compute the averages over 
the coarse-scale parameters rather precisely, leading to certain tractable divisor 
sums over "locally bad primes" which can then be averaged over fine scales. Here we 
will rely on a basic heuristic from algebraic geometry, which asserts that a "generic" 
slice of an algebraic variety by a linear subspace will have the same codimension 
as the original variety. In our context, this means that a prime which is "globally 
good" with respect to many parameters, will also be "locally good" when freezing 
one or more parameters, for "most" choices of such parameters. We will phrase the 
precise versions of these statements as a kind of "combinatorial Nullstellensatz" (cf. 
PQ) in Appendix IdI This effect lets us deal with the previous difficulty that the 
sum of ~ over bad primes can occasionally be very large. 

We have already mentioned the need to control the density of varieties such as 
{x € : WP(x) + 1 = 0}, which in general requires the Weil conjectures as 
proven by Deligne. Fortunately, for the application to polynomial progressions, 
the polynomials P involved will always be linear in at least one of the coarse-scale 
variables. This makes the density of the algebraic variety much easier to compute, 
provided that the coefficients in this linear representation do not degenerate (either 
by the linear coefficient vanishing, or by the linear and constant coefficients sharing 
a common factor). Thus we are able to avoid using the Weil conjectures. In fact 
we will be able to proceed by rather elementary algebraic methods, without using 
modern tools from arithmetic algebraic geometry; see Appendix IDI 

8.2. Notation. We now set out some notation which will be used throughout the 
proof of l|73|) and (|74|l . If p is a prime, we use F p to denote the finite field of p 
elements. 

If P, Q lie in some ring R, we use P\Q to denote the statement that Q is a multiple 
of P. An element of a ring is a unit if it is invertible, and irreducible 32 if it is 
not a unit, and cannot be written as the product of two non-units. A ring is 
a unique factorization domain if every element is uniquely expressible as a finite 
product of irreducibles, up to permutations and units. If Pi, ... ,Pj lie in a unique 
factorization domain, we say that P±, . . . , Pj are jointly coprime (or just coprime if 
J = 2) if there exists no irreducible which divides all the Pi, . . . , Pj, and pairwise 
coprime if each pair Pi,Pj is coprime for 1 < i < j < J; thus pairwise coprime 
implies jointly coprime, but not conversely. 

As observed by Hilbert, if R is a unique factorization domain, then so is i?[x] (thanks 
to the Euclidean algorithm). In particular, P p [xi, . . . ,X£>] and Z[xi, . . . ,X£>] are 
unique factorization domains (with units P p \{0} and {—!,+!} respectively). 



32 We shall reserve the term prime for the rational primes 2, 3,5,7,... to avoid confusion. 
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Every polynomial in i?[xi, . . . , x^] can of course be viewed as a function from R D 
to R. If P <E Z[xi, . . . , xd] is a polynomial and AT > 1, we write P mod iV for the 
associated polynomial in Zjv[xi, . . . , X£>] formed by projecting all the coefficients 
onto the ring Zjv, thus P mod N can be viewed as a function from Z^ to Zjv- 
Note that this projection may alter the property of two or more polynomials being 
jointly or pairwise coprime; the precise analysis of when this occurs will in fact be 
a major focus of our arguments here. 

It will be convenient to introduce the modified exponential function 

Exp(x) := max(e I - 1, 0) 

thus Exp(x) ~ x when x is non-negative and small, while Exp(AT) ~ e x for x large. 
Observe the elementary inequalities 

(76) Exp(a; + y) < Exp(2x) + Exp(2y); Exp(x) K = K (Exp(Kx)) 
for any x, y > and K > 1. 

9. Local estimates 

Before we estimate correlation estimates for v on the integers, we first need to 
consider the analogous problem modulo p. To formalize this problem we introduce 
the following definition. 

Definition 9.1 (Local factor). Let P\,.. , ,Pj : Z[xi, . . . , xd] be polynomials with 
integer coefficients. For any prime p, we define the (principal) local factor 

Cp(Pl, . . . , Pj) := E l£F D Y[ ^Pj(x)=0 mod p- 

We also define the complementary local factor 

Cp(Pl) ■■■■> Pj) :== E ieF D Y[ ^Pj(x)^0 mod p- 

Examples 9.2. If Pi, ... ,Pj are homogeneous linear forms on , with total rank 
r, then c p (Pi, . . . , Pj) — p~' '. If the forms are independent (thus J = r), then 
Cp"(Pi, . . . , P/) = (I - i) J . If D = 1, then the local factor c p (x 2 + 1) equals 2/p 
when p = 1 mod 4 and equals when p = 3 mod4, by quadratic reciprocity. (When 
p = 2, it is equal to l/p ) More generally, the Artin reciprocity law j2H] relates 
Artin characters to certain local factors. Deligne's celebrated proof jZj, |B] of the 
Weil conjectures implies (as a very special case) that c p (P) = l/p + Ofe,i)(l/p 3 / 2 ) 
whenever P € Z[xi, . . . ,x_d] determines a non-singular projective algebraic variety 
over F p . For instance, if P = x|— x 3 — axi — b, so that P determines an elliptic curve, 
with discriminant A = — 16(4a 3 + 27& 2 ) coprime top, then c p (P) = l/p + 0(l/p 3 ^ 2 ) 
(a classical result of Hasse). The Birch and Swinnerton-Dyer conjectures, if true, 
would provide more precise information (though not of upper bound type) on the 
error term in this case. 

Remark 9.3. The factor c p denotes the proportion of points on F® which lie on 
the algebraic variety determined by the polynomials Pi, . . . ,Pj, while the comple- 
mentary factor is the proportion of points in FF 1 for which all the Pi , . . . , Pj are 
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coprime to p. Clearly these factors lie between and 1; for instance when J = wc 
have c p = 1 and Cp~ = 0. Our interest is to estimate c p for higher values of J. This 
will be of importance when we come to the "global" estimates for rijGrji u (Pj( x )) 
over various subsets of Z d ; heuristically, the average value of this expression should 
be approximately the product of the complementary factors Cp" as p ranges over 
primes. 

From the inclusion-exclusion principle we have the identity 
(77) cj(P l ,...,Pj)= ]T (-l)l s M(^ks) 

SC{1,...,J} 

and so we can estimate the complementary local factors using the principal local 
factors. 

As mentioned earlier, the precise estimation of c p (P\ , . . . , Pj) for general P\,...,Pj 
is intimately connected to a number of deep results in arithmetic algebraic geometry 
such as the Weil conjectures and the Artin reciprocity law. Fortunately, for our 
applications we will only need to know the i coefficient of c p (Pi, . . . ,Pj) and can 
neglect lower order terms. Also, we will be working in the case where each of the 
polynomials Pj are linear in at least one of the co-ordinates x\,... ,Xd of x and 
are "non-degenerate" in the other co-ordinates. In such a simplified context, we 
will be able to control c p quite satisfactorily using only arguments from elementary 
algebra. To state the results, we first need the notion of a prime p being good, bad, 
or terrible with respect to a collection of polynomials: 

Definition 9.4 (Good prime). Let Pi, . . . ,Pj € Z[xi, . . . , x£>] be a collection of 
polynomials. We say that a prime p is good with respect to P\,...,Pj if the 
following hold: 

• The polynomials P\ mod p, . . . , Pj mod p are pairwise coprime. 

• For each 1 < j < J, there exists a co-ordinate 1 < ij < D for which we 
have the linear behavior 

Pj(xi, . .. ,x D ) = P,-,i(xi, . . ..X^-l.X^+l, . . . ,X£,)x i3 

+ Pj,o(xi, . . . ,x i3 _i,x Jj+ i, . . . ,x D ) mod p 

where Pj^Pj^ € F p [xi, . . . , Xj J _i,Xj j+ i, . . . , x D ] are such that Pj t i is non- 
zero and coprime to Pj . 

We say that a prime is bad if it is not good. We say that a prime is terrible if at 
least one of the Pj vanish identically modulo p (i.e. all the coefficients are divisible 
by p). Note that terrible primes are automatically bad. 

Our main estimate on the local factors is then as follows. 

Lemma 9.5 (Local estimates). Let Pi, . . . ,Pj € Z[xi, . . . , x D ] have degree at most 
d, let p be a prime, and let S C {1, . . . , J}. Then: 



(a) J/|S| = 0, thenc p {{Pj)jes) = l. 
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(b) If\S\ > 1 and p is not terrible, then c p ((Pj)jes) = Od,D,j(p)- 

(c) // |5| = 1 and p is good, then Cp((Pj)j £ s) = | + O d ,D,j(^z)- 

(d) If\S\ > 1 andp is good, then c p ((P i )ies) = O d: D,j{^)- 

(e) If p is terrible, then c^(Pi, . . . , Pj) — 0. 

(f) If p is not terrible, then c^(Pl, ■ ■ ■ , Pj) = 1 + Od,D,j{^)- 

The proof of this lemma involves only elementary algebra, but we defer it to Ap- 
pendix [D] so as not to disrupt the flow of the argument. 

Remark 9.6. From l|77|l and Lemma 19 . 5f acd) we also have 

e£(Pi, . . . , Pj) = 1 - - + O d , D .j{\ ) 
p p z 

when p is good. In practice we shall need a more sophisticated version of this fact, 
when certain complex weights p~^-i£s Zj are inserted into the right-hand side of 
(|77f) : see Lemma ITO 



10. Initial correlation estimate 



To prove l|73|) and (|74|l we shall need the following initial estimate which handles 
general polynomial averages of v over large scales, but with an error term that can 
get large if there are many "bad" primes present. More precisely, this section is 
devoted to proving 

Proposition 10.1 (Correlation estimate). LetP\,...,Pj E Z[xj., . . . , xo] have 
degree at most d for some J, D,d> 0. Let fl be a convex body in R D of inradius at 
least R AJ+1 . Let P;, be the set of primes w < p < R logR which are bad with respect 
to WPi +b,..., WPj + b, and let P t C P fc be the set of primes w < p < R losR 
which are terrible (as defined in Definition \9.I$ . Then 

(78) E xennZ o Y[ v(P j (x)) = l M +o D ,j,4l) + D , J , d (Exp{0 D ,j, d (Y l ;)))• 
j&l-J] peP b p 

Remark 10.2. We only expect this estimate to be useful when the number of bad 
primes is finite. This is equivalent to requiring that the polynomials P±, . . . , Pj are 
coprime, and each one is linear in at least one variable. Because the sum | is 
(very slowly) divergent (see l|110|) 'l. the last error term can be unpleasantly large on 
occasion, but in practice we will be able to introducing averaging over additional 
parameters which will make the effect of the error small on average, the point 
being that the sets Pt,P& are generically rather small. The radius R 4J+1 is not 
best possible, but to lower it too much would require some deep analytical number 
theory estimates such as the Bombieri- Vinogradov inequality which we shall avoid 
using here. The upper bound R}° sR (which was not present in earlier work) can 
also be lowered, but for our purposes any bound which is subexponential in R will 
suffice. 

Remark 10.3. All the primes p < w will be bad (but not terrible); however their 
contribution will be almost exactly canceled by the term present in v and 

we do not need to include them into P&. Even a single terrible prime will cause 
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the main term lp t =0 to vanish (basically because one of the Pj(x) will now be 
inherently composite and so will unlikely to have a large value of v), which will 
make asymptotics difficult; however, terrible primes are no worse than merely bad 
primes for the purposes of upper bounds. 



of Proposition \lf). 1\ Throughout this proof we fix D, J, d, and allow the implied 
constants in the OQ and o() notation to depend on these parameters. We will also 
always assume R to be sufficiently large depending on D, J, d. 

We expand out the left-hand side using l(721) as 



W 



(79) 



iogi? Yl 

mi,m' 1 ,...,m,;,m' J >l \ j'G[J] 



n 



logR ' y logi? 



E 



i£!inZ D \\ hcm(m j ,m' :j )\WP j {x)+b- 
3=1 



Here of course lcm() denotes least common multiple. Note that the presence of the 
\x and x factors allows us to restrict mi, . . . , m'j to be square-free and at most R. 

The first task is to eliminate the role of the convex body f2, taking advantage of the 
large inradius assumption. Let M := lcm(mi,m' 1 , . . . , mj,m'j), thus M is square- 
free and at most R 2J . The function x i— > li cm ( mj ,m')\w Pj (x)+b is periodic with 
respect to the lattice M ■ Z D , and thus can be meaningfully defined on the group 
Z^-. Applying Corollary IC. 31 (recalling that is assumed to have inradius at least 
R 4J+1 ), we thus have 



^xeflnZ D Y\_ ^icm(7nj ,m' j )\WP j (x)+b 
3=1 



1 \ J 
1 + ( #2.7+1 ) ) E y£Z« II l ^ra( m3 , m ' 3 )\WP 3 



(y)+b- 

3=1 



Let us first dispose of the error term 0( )■ The contribution of this term to 
(|7^|l can be crudely bounded by 0(R^ 2J ^ 1 ), and so the contribution of this term 

to jzni can be crudely bounded by 



\ Kmi ,771^ ,. . .jmj^m'jKR J 
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Thus we may discard this error, and reduce to showing that 



w 



\ogR 



E 



mi ,. . .,m j ,m j>1 



, T-r , . , dogm,, logm'- 

(80) | 11 Ai(m ^ Ai(TO J )x( lo7^" )x( lo^R ) I a ^("H.™i).-.lcm(m J> m' J ) 

= l Pt =0 + + O (ex P \0( £ ~) j j 

where a lcm ( mi)m /) ) ... i i cm ( mj)m / r ) is the local factor 

J 

( *lcm(mi,m' 1 ),...,lcm(mj,m' J ) := ^y^Zfj 1 1 llcm(mj ,m'. ) | WPj (y)+b ■ 

Observe from the Chinese remainder theorem that a is multiplicative, so that if 
lcm(mj , m'j) = ]J p p rj then 

^lcm(mi.m ; 1 ),...,lcin(mj,m f / ) 1 1 t ~^p r i ,...,p r J 

P 

(note that all but finitely many of the terms in the product are 1. If the m 1; . . . , m'j 
are squarefree then the Tj are either zero or one, and we simplify further to 

aicm(mi,m' 1 ),...,lcm{mj,m' J ) = Y\_Cp{(WPj + b) rj= t) 

P 

where the local factors c p are defined in Definition 19.11 in the appendices, and the 
dummy variable j is ranging over all indices for which rj = 1. Also note that the 
rrij , m'j are bounded by R, we may certainly restrict the primes p to be less than 
i? logi? without difficulty. 

The next step is to replace the x factors by terms which are multiplicative in the 
rrij , ml- . Since x is smooth and compactly supported, we have the Fourier expansion 

(81) e* X (x) = f°° ip(0e-^ d£ 

J — oo 

for some smooth, rapidly decreasing function </?(£) (so in particular tp(£) = Oa((1 + 
|£|) - ^) for any A > 0). For future reference, we observe that lj81|l and the hypothe- 
ses on x will imply the identity 



oo />oo 



(see [201 Lemma D.2], or the proof of 33, Proposition 2.2]). 

We follow the arguments in [22], [2], [HSjj |20| . except that for technical reasons 
(having to do with the terrible primes) we will be unable to truncate the £ variables. 
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From (f£T|> we have 



X 



log rrij 
fogR 
log m'j 



-OC 
OO 



where we adopt the notational conventions 

*i : = (1 +6)/ log i?; ^ := (1 +£)/ log*. 

Our task is thus to show that 
(83) 
~<t>(W) 



w 



logi? 



E 

mi ,m^,...,m j,m j>1 



— 00 J —00 



II ^{■m ] )^{m' 3 )m J Z3 {m' 3 ) ** <p(£j)<p(gj) d^d^ 



J] c p ((W^- + fe) r . =1 ) 

p < fl logi?. 



The left-hand side can be factorized as 

1 J 



POO 


poo 




I —00 


J — 00 





where E p — E p (zi, . . . , z'j) is the Euler factor 

** := E II AK)MK)m7^(m;.)^c p ((WP J + 6) ri=1 ). 

mi,...,ro'j£{l,p} j£[J] 

Note that if the Zj, z'j were zero, then this would just be the complementary factor 
~Cp~(WPi + b, . . . , WPj + b) defined in Definition 19. II see (|77|l . Of course, Zj, z'j are 
non-zero. To approximate E p in this case we introduce the Euler factor 

(1 - l/p 1+z *)(l - l/p 1+z 'i) 



1-1/p 



l+Zj+Zj 



Note that E' p never vanishes. 



Lemma 10.4 (Euler product estimate). We have 

lM+<>(1)+0 ( Eip f 0( ^j))))^ 



TT — 

11 E' 

p < R \ogR P 



Proof. For p < w, we directly compute (since w is slowly growing compared to _R, 
and H^Pj + b is equal modulo p to 6, which is coprime to p) that 

£L = 1 + o(l) and EL = (1 - V + o(l) 
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and hence (again because w is slowly growing) 



nt-e+*»W- 



(1,+0 H 0< 5>)) 



p < IU 

Thus it will suffice to show that 

n §■ = - " 

w<P<R losR p 
For p terrible, Lemma T9.5I gives the estimate 

E p = O(-) = 0(-4) 
p p p 

and so it will suffice to show 

w< P <R l °g R , not terrible p \ \ P eP t / / 

For p bad but not terrible, Lemma 19.51 gives the crude estimate 
E p = 1 + 0(l/p) = exp(0(l/p))S; 

and thus 

n ^-»H^>)) 



o<p<R 1 ° sR , bad but not terrible 



Thus it suffices to show that 

n 

w<p<-R 1o s r , good p 

Since the product HpC^- + ^(pO) ^ s convergent, and u> goes to infinity, it in fact 
suffices to show that 

E p = (l + 0(±))E' p 

for all good primes larger than w. But this easily follows from Lemma 19.51 and 
Taylor expansion (recall that the real parts of zj, z'j are 1/ logi? > 0). □ 

Now we use the theory of the Riemann zeta function. From we have 

-r-r , C(l + Z;+z') 



On the other hand, from H1U8|) we have 



= (i+ ((i+iein)- 



C(l + (l + if)/logi2) v vv log-R 
and 

C(l + (l + zO/logi?) = (l + ((l + |ei) 2 )) 



2 \ \ lo g^ 



1 + tC 



POLYNOMIAL PROGRESSIONS IN PRIMES 



-,<) 



for any real £, and hence 



n E' p =(i+o((i+\^\+\^\f)) 



p < R logR 

Applying Lemma fl 0.41 we conclude 



i (i + »e ? )(i + »g;.) 

logi? 2 + 



w 



\ogR 



n e ? 

p < fl logH 



n (i + 101 + i^i) 6 j + o ( Exp (o\^ g i) m 



'pj-c + " I I! CL + K; )" I +0(En P |c>| > . -) I I I J] 

je[J] 

Thus by the triangle inequality, to show (|83l) it will suffice to show that 

1 



2 + t& + iCi 



R 



R 



MA 



2 + + i£ 



and 



II i^o)ii^)i(i + ioi + i^if ^S^ = oi i ). 



2 + t&+t£. 



But the first estimate follows from (|82|l , while the second estimate follows from the 
rapid decrease of ip. This proves Proposition llO.il □ 



To illustrate the above proposition, let us specialize to the case of monic linear 
polynomials of one variable (this case was essentially treated in JS] or |14p. 

Corollary 10.5 (Correlation condition). Let hi, . . . ,hj be integers, and let I C R 
be an interval of length at least R AJ+1 . Then 

E. Te/n z ]J "fa + hi) = 1 + °D,-iA l ) + D ,j,d Exp D ,j,d{ - ) ) ) 

Pb := {w < p < R losR : p\hj - hj> for some 1 < j < j' < J}. 

Proof. Apply Proposition I10.l| with Pj(x) :— x + hj and il := I. Then there are 
no terrible primes, and the only bad primes larger than w are those which divide 
hj — hji for some 1 < j < f < J. □ 

This can already be used to derive the "correlation condition" in 18;; a similar 
aplication of Proposition 110.11 also gives the "linear forms condition" from that 
paper. We will also need the following variant of the above estimate: 

Corollary 10.6 (Correlation condition on progressions). Let hi, . . . ,hj be integers, 
let q > 1, let a £ Z q , and let I C R be an interval of length at least qR 4J+1 . Then 



^x^InZ-x— a mod q 



Yl v{x + hj) = Odja exp D j 4 { ^ -) j 
■£[J] \ V P eP b P J J 
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where 

(84) P b := {w < p < R logR : p\hj - h y for some 1 < j < f < J} U {p > w : p\q}. 



Proof. Apply Proposition II . II with Pj(x) := (qx + a) + hj and with £1 := {x G R : 
qx + a G /}. Then the bad primes are those which divide hj — hji or which divide 
q. (There are terrible primes if a and q are not coprime, but this will not affect the 
upper bound. One can get more precise estimates as in Corollary II 0.51 but we will 
not need them here.) □ 



This in turn implies 

Corollary 10.7 (Correlation condition with periodic weight). Let h\,...,hj be 
integers, let q > 1, let I G R be an interval of length at least qR AJ+1 , and let 
f : Z — > R + be periodic modulo q ( and thus definable on Z 9 . Then 

Exeinzf(x) Y[ v(x + h 3 ) = D ,j,d I (E J/6Zg /(y)) exp J D . J4 {^ -) 
where was defined in (|84p . 



Proof. The left-hand side can be bounded by 

Ey6Z ? /(y)E 2:e /nZ;£c=a mod q v{x + hj) 

simply because the set {I<~)Z;x = a mod q} has cardinality roughly | that of 
In Z (by the hypotheses on the length of I). The claim then follows from Corollary 

gnu □ 



11. The polynomial forms condition 



In this section we use the above correlation estimates to prove the polynomial forms 
condition (|73|l . We begin with a preliminary bound in this direction. 

Theorem 11.1 (Polynomial forms condition). Let M,D,d, J > and e > 0. Let 
Pi, . . . , Pj G Z[mi, . . . , mjj] be polynomials of degree d with all coefficients of size 
at most W . Let L C R be any interval of length at least R AJ+1 , and let C R D be 
any convex body of inradius at least R e . Then 

Ei£/nZ;meanz D 

n v(*+Pj(&)) 

je[J] 

= 1 + Dtd ,s,J I exp I D A,e,j{ X] ~) 

V V pePi. p 

where Pi, denote the set of all w < p < R logi? which are "globally bad" in the sense 
that p\Pj — Pj' for some 1 < j < f < J ■ 
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Proof. Let us fix D,d,J,s, and allow all implied constants to depend on these 
quantities. From Corollarv ll0.5l we have 

E.e/nz J] < x + P M)) = 1 + + ( Ex P ( °( £ -) ) I 

je[J] V V peP A P J J 

for all fh E Q' n Z D , where P,^ are the collection of primes w < p < R losR such 
that p|P/(m) — Pji(rh) for some 1 < j < j' < d. Thus it suffices to show that 

w, tej , )-H°fe)))' 

Applying i|76|) . we reduce to showing that 

E« n z»Exp 0( V -i| nil) 



;D Exp(o( £ ;)] 

\ p6P™\P„ y J 



Applying Lemma IE. II it suffices to show that 
V- log^'p 

/. EWnnzDipgp-. — o(l). 

w<p<R)°z R ;p<£P b 

From pi2(l it will suffice to establish the bounds 

E mefinz D lp£P™ = O(-) + 0{—) 

for any to < p < R losR with p P 6 (note that log(i? logi *)° (1) = o(R £ )). By the 
triangle inequality it suffices to show that 

E m6finZ D lp|P 3 (m)-P 3 ,(m) = O(-) + ^( — ) 

for all 1 < j < f < J. 

Fix j, j' . Observe that the property p\Pj (fh) — Py (fh) is periodic in each component 
of fh of period p, and can thus meaningfully be defined for fh E . Applying 
Corollary IC. 31 ffor p <C R e ) or Lemma IC.4I ffor p ^> R e ) it will thus suffice to show 
the bound 

E mi £Ai ,...,niD£Ao ~^p\Pj (mi ,...,mn) — P»i (tni ,...,m D ) 0(~r~:) 



for all subsets A%, . . . ,Ad in F p of size at least M > 1 for some M. But since 
p £ Pb, the polynomial Pj — Py does not vanish modulo p, and this claim follows 
from Lemma Ith3l □ 



We can improve the error term if the coefficients of the polynomials are not too 
large: 

Corollary 11.2 (Polynomial forms condition, again). Let M, D, d, J > and e > 
0. Let Pi, . . . , Pj E Z[mi, . . . , rxirj] be distinct polynomials of degree d with all 
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coefficients of size at most W M . Let I C R be any interval of length at least 
R AJ+1 , and let C R D be any convex body of inradius at least R £ . Then 

E xe /nz ; «ennz^ J| v{x + Pj{m)) = 1 + o D4 ^j >M 0-)- 

Proof. Let Pb denote the set of all w < p < R l ° sR such that p\Pj — Pj< for some 
1 < j < j' < J. Since Pj — Pj> is nonzero, this p must then divide a non-zero 
difference of two of the coefficients of the Pj, which is 0(W M ). Thus the total 
product of all such p is at most 0(W°^), and hence by Lemma lE.31 we have 
J2 P eP b p = The claim now follows from Theorem lll.fi □ 

From Corollary 1 11. 21 the desired estimate l|73|l quickly follows. 



12. The polynomial correlation condition 

Now we use the estimates from Section 1101 to prove the polynomial correlation 
condition (|74|l . It will suffice to prove the following estimate. 

Theorem 12.1 (Polynomial correlation condition). Let B,D,D' , D" , d, J, K , L > 

and e > 0. For any j e [J], k G [K], I £ [L], let 



Pj e Z[h 

Qj,k, e z[h 

^ G Z[h 



l! ■ 


..,h D < 


\ D 


1) • 


..,h D , 


]° 


l! ■ 


..,h D > 





be polynomials obeying the following conditions. 

• For any 1 < j < j' < d and 1 < k < K , the vector-valued polynomials 
(Pj,Qj,k) an d (Pj' 5 Qj' .k) are not parallel. 

• The coefficients of the Pj ^ and Si t d' are bounded in magnitude by W B . 

• The vector-valued polynomials Si are distinct as I varies in [L] . 

Let I C R be any interval of length at least R iL + 1 ) let C H D be a bounded convex 
body of inradius at least R 8J+2 , and let 17' C R D and Q," C R D have inradii at 
least R e . Suppose also that fl" is contained in the ball B(Q,R B ). Then 
(85) 

]^[ E.m SnnZ D Y[ v{x + Pj{h) ■m + Q J , k (h) ■ n) 
ke[K] j£[J] 

11 u{x + Si{h)-n) 
ie[L] 

= 1 + B ^ D ,D',D",dJ,K,L,E(^)- 



E 



e j,nefi'nz D ' ,Kef2"nz D " 
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Proof. We repeat the same strategy of proof as in the preceding section. We fix 
B, D, D', D", d, J, K, L, e and allow implicit constants to depend on these parame- 
ters. Thus for instance the right-hand side of (|85|l is now simply 1 + o(l). 

We begin by fixing k, x, h, n and considering a single average 

Jl v{x + Pj(h) ■ m + Qj, k (h) ■ ft). 

By Proposition llO.il this average is 



(86) 1 



Pt[fc,a:,?i,n]=0 



(i) + o(exp(o( £ M)) 



where Pt[A;, x, h, ft] are the collection of primes w < p < R logR which are terrible 
with respect to the linear polynomials 

(87) W x [x + Pj(h) • m + Qj, k (h) ■ n] + b e Z[mi, . . . , m D ] for j E [J], 
and Pf)[fc, x, h, n] are the collection of primes which are bad. We can thus express 

nnz D Yl v ( x + (h) ■ m + Q hk (h) ■ ft) 

ka[K] is [J] 

using ({TrJj) as 

1 p t [fe,x,M=0forallfc + °( 1 ) + O Ex P E I 



P 

PSUf=i Pb[k,x,h,n] 



which we estimate crudely by 

l +0 (l)+0(£ ^ l)+0 Exp O J2 I 

\'!6[- K "]pePt[fe,x,h,n] / \ \ \pe[J kelK] Pb[k,x,h,n]\Ptlk,x,h:n] 

Observe that if p > w is terrible for 187fl . then p\Pj(h) and p\x + Qj,k(h) ■ ft for 
some j € [J], while if p > w is bad but not terrible, then 

p\{Pj{h),^ k {h) -ft) A (P r {h),Q r , k (ti) ■ ft) 

for some 1 < j < j' < J, where A denotes the wedge product on D + 1-dimensional 
space. Thus we may estimate the preceding sum (using (|76|l ') by 

i+o(i)+ £ E °( E !) 1/2 

1/2 

£ 

At this point we pause to remove some "globally bad" primes. Let denote the 
primes w < p < R iogR which divide (Pj,Qj,k) A (Pj>,Qj>) for some 1 < j < f < J 
and k G [K] (note this is now the wedge product in D + D' dimensions). Because the 
wedge products (Pj,Qj,k) A (Pj> , Qj') are non-zero and have coefficients 0(W°^), 





( 


f I 


E o 


Exp 





i<j<r<j 


V 


\ \ 
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the product of all these primes is 0(W°^), and hence by Lemma lE.31 we have 
^2peP b p ~ Thus we may safely delete these primes from the expression 

inside the Exp(). If we then apply Lemma lE.ll we can bound the above sum as 



i+ (i) + £o Yl 

j£[J] \w<p<R 1 °sR:p\p j (h). x+ Q :ltk (h).n 

+ T oi V 

U \ „ 1 -p\(P j (h),Q :jtk (h)-ii)A(P ] ,(h),Q ] , ik (h)-n) 

l<j<j'<J \nj<p<ff»s H ;! )^P b 

Inserting this bound into (|85() and using Cauchy-Schwarz, we reduce to showing 
the bounds 

(88) V x ei;nen>;JZen» U "(* + ' n) = 1 + 

(89) K ... „ E l[u(x + S l (h)-n)=o(l) 

w<p<R l °Z R ip\Pj{%),x+Q jth {%)-nl£[ L ] 

E V lQ g° (1) ^ 

^xe I -Men' -hen" Z^i „ 1 p\(P j (h),Q jtk (h)-n)A(P J ,(h),Q j , k (h)-n) 

w<p<B>°s R ;p^P b 

(90) JJ u(x + Si(h) ■ n) = o(l) 
ie[L] 

for all 1 < j < f < J and k G [if]. 

The bound l|88|) already follows from Corollarv lll.2l and the hypotheses on Sid'- 

Now we turn to l|59"jl. We rewrite the left-hand side as 

(91) 

E E »efi';fefi» ( 1 p\PAh) E ^ ll x=-Q J Mh)-r l mod p II v( - x + • ") 

w<p<R l °£ R \ le[L] 

Let us first consider the contributions of the primes p which are larger than R AL+1 . 
In this case we bound v extremely crudely by 0(R 2 logi?) (taking absolute values in 
C2J), to bound the inner expectation of JSJ by 0{^t+r{R 2 logi?) L ) = o(i?- 1 / 2 ) 
(say) , and to show that 

*w E = °(i? 1/2 ). 

7? 4i <p<i?, 1 °s« 

But from the bounds on O" and h we see that Pj(h) = 0(R°^), and so at most 
0(1) primes p can contribute to the sum for each h. The claim follows. 

Now we consider the contributions of the primes p beteen w and R AL+1 . We can 
then apply Corollary 1 1 . 71 and estimate the inner expectation of H91J) by 



lo 

P 



1 




exp 




V 





E 
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which by Lemma [E. II can be bounded by 

y l<l<l'<L w <p'<Rlo S R W 

The contribution to l|91|l can thus be bounded by the sum of 

~°( E heO" 1 P |P,(K)) 



P 

w<p<R}°& R 



and 



log^V 



w rj /ien"- L P |P3(S)V|s, 

1<;<Z'<L w<p,p'<R'°s « 



pp, fcefi" ViP; (ft) Vis, w-s,, (ft) ■ 



Now by hypothesis, the vector-valued polynomials Pj and S 1 ; — Si> are non-zero, 
thus by Lemma ID. 31 we have 

E heA 1 x...xA D ,, 1 P '\s l (h)-s l ,(h) = °(^m> 

whenever A\, .. . ,A£>rr C F p / have cardinality at least M. Applying Corollary IC. 31 
and Lcmma rC.4l we conclude that 

E hen" 1 P '\s l (h)-s l ,(h) = + 



A similar argument gives 

E ^i Pl P, (K )=o(i) + o(i) 

and hence by Cauchy-Schwarz 

E Ken" 1 )9 |p J (K) 1 P '\Si(h)-s l ,(h) = (( pp /)i/ 2 ) + °(^/2^ 

Applying all of these bounds, we can thus bound the total contribution of this case 
to (23} by 



£ W) + o ( ±> 1 + £ ^ 

^ — ' p p H s A — ' pp 

w<p<R}°z R w<p,p'<R l °z R 



^ ) + ^ 



which is o(l) by l(TTT|) . (TI2jL 

Finally, we consider (|9t)|l . We first apply Theorem II 1.11 to bound 
V xeI H v{x+Si{h)-n) = O I cxp I O I J] £ 

\ V \ 1 <'< i '<- L iu<p'<P lo s«:p'|[S 1 (K)-S ! /(ft)]-n 



Once again we must extract out the "globally bad" primes. Let Y[' b denote all the 
primes w < p' < R l ° sR which divide Si — Si> for some 1 < I < I' < L. Since 
these polynomials are non-zero and have coefficients 0(W°^) 7 the product of all 
the primes in Y[' b is 0(W ^), and hence by Lemma lE.31 as before these primes 
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contribute only o(l) and can be discarded. If we then apply Lemma |E.1I we can 
bound the preceding expression by 

+ E E g P > P °(V|[(s f 1 (K)-s ! ,(K))].s)- 

l<Kl'<Lw<p'<R 1 °s R :p'^Yl' b 

Thus to prove l|9U[) it suffices to show the estimates 

V log ° (1)p E 1 m 

v ^n,h 1 p\(P j (h),Q jik (h)-fi)A(P J ,(h),Q j , k (hyn)- U \ 1 ) 



W<p<R l °& R ;p<?:~P b 

and 



E 



log «plog°(V 



pp 

™<p,p'<R l ° sR ;p£~Pb,p'<?n'b 

for all 1 < j < f < J and 1 < I < I' < L, where n, h are averaged over W n 
Z D , Q," n Z D respectively. Applying Cauchy-Schwarz to the expectation in latter 
inequality and then factorising the double sum, we see that it will suffice to show 
that 

log° (1) p / \i/2 

( 92 ) p { E n,h 1 p\(P ] {h),Q ] Mhyn)A(P J ,(h),Q J ,^hyn)) = °W 

w<p<R l °z R \p£~Pb 

and 

log° (1 V / \V2 

w<p'<R l °z R :p<Zl[b> 

Since p ^ Pj, we observe from Lemma TP. 31 Corollary IC. 31 Lemma IC .41 that 

^pttPAVaiAhY^HPrW'Qi'^)-™) = + <J<y W^ 

and the claim l|92[l now follows from Ulllfl . 11121) . The estimate (|93|l is proven 
similarly. This (finally!) completes the proof of Theorcm ll2.1l □ 



Appendix A. Local Gowers uniformity norms 



In this appendix we shall collect a number of elementary inequalities based on the 
Cauchy-Schwarz inequality, including several related to Gowers-type uniformity 
norms. 

The formulation of the Cauchy-Schwarz inequality which we shall rely on is 

(94) \VaeA,beBf(a)g(a, b)\ 2 < (E aeA F(a))(E QeA F( a )|E bes3 (a, b)\ 2 ) 

whenever / : A — > R, F : A — > R + , g : A x B — > R are functions on non-empty 
finite sets A, B with the pointwise bound |/| < F. 

One well known consequence of Cauchy-Schwarz is the van der Corput lemma, 
which allows one to estimate a coarse-scale average of a function / by coarse-scale 
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averages of "derivatives" of / over short scales. The precise formulation we need 
here is as follows. 

Lemma A.l (van der Corput). Let N , M and H be as in Section^ Let (x m ) m< =z 
be a sequence of real numbers obeying the bound 

(95) x m =0 E (N £ ) 
for any e > and m G Z . Then we have 

(96) E m6 [ M ]X TO = ~E h& [ H ]~E m€ [M]Xm+h + o(l) 
and 

(97) |E 

I < ^h,h'e[H]^me[M] x m+hX m +h' + o(l). 

Proof. From (|93|l we see that 

E me [Mpm = E me [ M ].X m+ h + o(l) 

for all ft- € [if] ; averaging over all /i and rearranging we obtain (|96(l . Applying l|94|) 
we conclude 

|E me [M]£m| < E m£ [ M ]|E,, e [ fl -]a; m+ / l | 2 + o(l) 
and follows. □ 

We will use Lemma |A.ll in only one place, namely Proposition 15. 1 41 which is the 
key inductive step needed to estimate a polynomial average by a collection of linear 
averages. 

Next, we recall some Cauchy-Schwarz-Gowers inequalities, which can be found for 
instance in j'2()l Appendix B]. Let AT be a finite non-empty set. If A is a finite set 
and / : X A — > R, define the Gowers box norm \\f\\a A as 

1/2I- 4 ' 

(98) \\fWoA := [ E TO( o), m (i )gJf A J] /((™£" a) W)' 

ue(o,i} A 

where ui — (uJ a )aeA and — (m,a ) a eA for i = 0, 1. This is indeed a norm 33 for 
| .A | > 2. It obeys the Cauchy-Schwarz-Gowers inequality 

(99) lE^o),^)^ J] /o,((m^)« 6 A)| < II Hl/«lb4 

we{o,i} A wg{o,i} a 

We shall also need a weighted variant of this inequality. 

Proposition A. 2 (Weighted generalized von Neumann inequality). Let A be a 
non-empty finite set, and let f : X A — > R be a function. For every a € A, let 
f a : X A \^ -> R and u a : X A ^ a ^ — ► R + be functions with the pointwise bound 
I fa | < "a- TVien we have 

|E^ eX A/(m) H / a (m|i\{ a })| < HI/Uda^)! H lk«|lnA\ { c,} 



33 If \A\ = then ||/|| nA = /(0), while if \A\ = 1 then ||/|| nA = |E/|. 
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where rh\A\{ a } is the restriction ofrh G X A to X A ^ a \ and \\f\\a A (i/) is the weighted 
Gowers box norm of f , defined by the formula 



J&{0,1} A 



x n n Va ( ( m t ) ■ 

a£A w(°)e{0,l} AX{al 



Proof. This is a special case of [231 Corollary B.4], in which all functions /s, vb 
associated to subsets B of A of cardinality |^4| — 2 or less are set equal to one. □ 



The local Gowers norm U a ^' ,ad defined in (|2"2")l is related to the above Gowers box 
norms by the obvious identity 

(100) H/H^-^ ={V xe x\\F x \\ 2 n*) 1/2d 

for any / : X — ► R, where for each x € X the function F x : — > R is defined 

by 

F x (mi, . . . ,m d ) := f(x + a x mi + . . . + a d m d ). 
In particular, since D d is norm for d > 2, we easily verify from Minkowski's in- 
equality that u°J_^-' ad i s a norm also when d > 2. This in turn implies that the 

averaged local Gowers norms U 3-?^ are also indeed norms. 

° VM 



Now we introduce the concept of concatenation of two or more averaged local 
Gowers norms. If Q E Z[hi, . . . , h t , W] d and Q' e Z[h' 1; . . . , h! t , , W] d ' are a d-tuple 
and c?'-tuple of polynomials respectively, we define the concatenation Q Q' G 
Z[hi, . . . , h t , h[, . , . , h' t , ~W] d+d to be the d + d'-tuple of polynomials whose first d 
components are those of Q (using the obvious embedding of Z[hi, . . . , h t , W] into 
Z[hi, . . . , h t , h' 1: . . . , h' t ,, W]) and the last d' components are those of Q' (using the 
obvious embedding of Z [h^ , . . . , h ' t , , W] into Z [hi , . . . , h 4 , , . . . , h' t , W] ) . One can 
similarly define concatenation of more than two tuples of polynomials in the obvious 
manner. 

The key lemma concerning concatenation is as follows. 

Lemma A. 3 (Domination lemma). Let k > 1. For each 1 < i < k, let ti > be 
an integer and Qi € Z[hi, . . . ,ht i; W] di be a polynomial. Let t := t\ + . . . + tk, 
d := di + . . . + dk, and let Q G Z[hi, . . . , h t , W]'' be the concatenation of all the Qi. 
Then we have 

WgWjjQMH^i ,w) < IMI c/ Q([H] t ,»') 

Vm Vm 

for all 1 < i < k and g : X — > R. 

Proof. By induction we may take k — 2. By symmetry it thus suffices to show that 

ll9ll [/ <5([H] i ,W) < \\g\\ Q®Q'{[H]t + t' ,W) 
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for any g : X R and any Q g Z[hi, . . . , h t , W] d and Q' € Z[h' 1; . . . , h' t> , W] d ' . 
We may take dl > 1 as the case dl = is trivial. From and Holder's inequality 
it suffices to prove the estimate 



\\g\\u* 



< 



9 



for all oi, . . . , a d , a'\, ■ 
ity formula 34 



, a' d £ Z. Applying (|1L)(J|> it suffices to prove the monotonic- 

11/11 □ d ([v / A7] - Wf\\a d + d '([VM]) 

for any / : [yM] d — *■ R, where we extend / to [\/M] d+d by adding d! dummy 
variables, thus 

/(mi, . . . , m d , m d+1 , m d+d ,) := /(mi, . . . , m d ). 

But this easily follows by raising both sides to the power 2 d and using the Cauchy- 
Schwarz-Gowers inequality l|99|l for the O d+d norm (setting 2 d factors equal to /, 
and the other 2 d+d ' - 2 d factors equal to 1). □ 



Appendix B. Uniform polynomial Szemeredi theorem 



In this appendix we use the Furstenberg correspondence principle and the Bergelson- 
Leibman theorem to prove the quantitative polynomial Szemeredi theorem, The- 
orem ^ .21 The arguments here are reminiscent of those in ; see also |H7| for another 
argument in a similar spirit. 

Firstly, observe to prove Theorem l3.2l it certainly suffices to do so in the case when g 
is an indicator function 1^, since in the general case one can obtain a lower bound 
g > |i£ where E := {x e X : g(x) > 5/2}, which must have measure at least 
6/2 -o(l). 

Fix Pi, . . . , Pk and <5, and suppose for contradiction that Theorem 13 . 21 failed . Then 
(by the axiom of choice 35 ) we can find a sequence of N going to infinity, and a 
sequence of indicator functions 1_e n : Z/NZ — > R of density 

(101) \E N \/N > S-o(l) 

such that 

lim E me[M] / Ut^/w 1en=q 

where Tjy is the shift on Z /NZ and N is always understood to lie along the sequence 
(recall that M and W both depend on N) . 

The next step is to use an averaging argument (dating back to Varnavides [HHi) to 
deal with the fact that M is growing rather rapidly in N. Let B > 1 be an integer, 



34 This is of course closely connected with the monotonicity of the Gowers U norms, noted 
for instance in 1181 . 

3 ^It is not difficult to rephrase this argument so that the axiom of choice is not used; we leave 
the details to the interested reader. 
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then for N sufficiently large we have 



and hence 




for all & € [B] . In particular 

and hence by the pigeonhole principle (and the axiom of choice) we can find mjv G 
[M/B] for all sufficiently large N such that 



and hence 




for each b > 1. 

We now eliminate the W and tun dependence by "lifting" the one-dimensional shift 
to many dimensions. Let d be the maximum degree of the P±, . . . ,Pk, then we may 
write 

Pi(Wbm N )/W = W j - 1 m j N b j c i!j 

for some integer constants Cij. Thus, if we set T^j := T N N , we have 
(102) lim E me[M] f TT ( TT T%f 3 )1 En = for all b > 1. 

Now we use the Furstenberg correspondence principle to take a limit. Let f2 be the 
product space := {0, 1} Z , endowed with the usual product er-algebra and with 
the standard commuting shifts T\ , . . . , Td defined by 

r j(k)nez") := K-e j )neZ'' for 3 G [d] 
where e±, . . . , is the standard basis for Z^. We can define a probability measure 
jUjv on this space by /xjv ■— EWz/tvzMjv.zj where (j,n,x is the Dirac measure on the 
point 

( It" 1 . . .T™ d x£E N )nez d ■ 
One easily verifies that /xjv is invariant under the commuting shifts Tx, ■ ■ ■ , Trf. Also 
if we let A C £1 be the cylinder set 

A := {(w„)„ eZ <* : = 1} 
then we see from (|101|) . C5H that /ijv(A) > <5 - o(l) and 

lim MivCfldl Tf j6 V)=0 
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for all b > 1. By weak sequential compactness, we may after passing to a sub- 
sequence assume that the measures fijsr converge weakly to another probability 
measure /x on f2, which is thus translation- invariant and obeys the bounds 

fi(A) >5>0 

and 

m n ( n t t^) a ) = ° f ° r au 6 ^ l 

te[fc] ie[d] 

But this contradicts the multidimensional Bcrgclson-Leibman recurrence theorem 
(i Theorem Aq\. This contradiction concludes the proof of Theorem 13. 21 



Appendix C. Elementary convex geometry 



In this paper we shall frequently be averaging over sets of the form fl D Z , where 
C H D is a convex body. It is thus of interest to estimate the size of such sets. 
Fortunately we will be able to do this using only very crude estimates (we only 
need the main term in the asymptotics, and do not need the deeper theory of error 
estimates). We shall bound the geometry of f2 using the inradius r(d); this is more 
or less dual to the approach in |2()j . which uses instead the circumradius. 

Observe that the cardinality of flDZ D equals the Lebesgue measure of the Minkowski 
sum (OnZ fl ) + [-1/2, l/2p of finZ D with the unit cube [-1/2, 1/2] D . The latter 
set differs from fi only on the y/D/ 2- neighbor hood M^^ 2 (dVt) of the boundary 
dfl. We thus have the Gauss bound 

|fi n Z D \ = mes(ft) + 0(mes(M^ /2 (dn))) 

where mesQ denotes Lebesgue measure. By dilation and translation, we thus have 

(103) |fl n (to • Z D + o)| = m -J, [mes(fi) + 0(mcs(Af m ^n /2 (dn)))} 

for any to > and a S H D . 

Now we estimate the boundary term in terms of the inradius r(Q) of f2. 

Lemma C.l (Gauss bound). Suppose that ft C R/° is a convex body. Then for 
any < r < r(f2) we have 

mes(N r (dty) = D (-^—)mes(Q). 



Proof. We may rescale r(f2) = 1 (so < r < 1), and translate so that contains 
the open unit ball B(0, 1). Elementary convex geometry then shows that for a 
sufficiently large constant Co > 0, we have 

B(x, r) C f2 whenever x G (1 — Cnf) ■ ft 

and 

B(x, r) n fl = whenever x ^ (1 + C^r) ■ tt. 

This shows that 

Af r (dn) C [(1 + C D r) ■ 0]\[(1 - C D r) ■ Q] 
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and the claim follows. □ 
From Lemma IC. II and (|103f) we conclude that 

(104) \fl n (to • Z D + a)\ = ( 1 + D ( ) ) m- D mes{n) 



whenever < to < r(Q) and a £ Z D . As a corollary we obtain 

Corollary C.2 (Equidistribution of residue classes). Let to > 1 be an integer and 
a £ Z^, and let 17 C R D be a convex body. If r(f2) > Com /or some sufficiently 
large constant Co > 0, £/ien we have 

to 



Eigf!nZ Di i£mZ D +[i — 1 + Od 



This lets us average TO-periodic functions on convex bodies as follows. 

Corollary C.3 (Averaging lemma). Let to > 1 be an integer, and let f : Z D — > R + 

be a non-negative m-periodic function (thus f can also be identified with a function 
on Z^). Let f2 C R/ be a convex body. J/r(0) > Com for some sufficiently large 
constant Co > 0, then we have 

Exmnznf(x) ={l + D J Eyezgf- 

Proof. We expand the left-hand side as 

Eigonzo/l 1 ) = 51 f{y)^xenr,7, D ^xem-x D + y 

and apply Corollary IC. 21 □ 

Corollary IC.3I is no longer useful when the period m is large compared to the 
inradius r(fi). In such cases we shall need to rely instead on the following cruder 
estimate. 

Lemma C.4 (Covering inequality). Let fl C R D be a convex body with r(O) > Co 
for some large constant Co > 1, and let f : Z D — * R + be an arbitrary function. 
Then 

E x6 nnz D /(#) <d sup E xej/+ [_ r (n),r(n)pnz D /(#)- 



Proof. From (|104|l we have |fi n Z-°| ~ mes(O), so it suffices to show that 
^2 f( x ) <£ > mes (^) SU P E xeH-[-r(n),r(n)pez D /0»0. 

This will follow if we can cover by Oo(mes(Q) /r(£l) D ) translates of the cube 
[-r(n),r(n)] D . 

By rescaling and translating, we reduce to verifying the following fact: if 51 is a 
convex body containing B(0, 1), then Q can be covered by 0_D(mes(fi)) translates 
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of [— 1, 1]-°. To see this, we use a covering argument of Ruzsa [2B]- First observe 
that because the cube [—1/2, 1/2] D is contained in a dilate of -6(0, 1) (and hence 
fl) by Od(1), the Minkowski sum + [—1/2, 1/2] D is also contained in an Od(1)- 
dilate of and thus has volume Oo(mes(0)). Now let x\ + [—1/2, 1/2]-°, . . . , xn + 
[—1/2, 1/2] D be a maximal collection of disjoint shifted cubes with x%, . . . , xn G fi, 
then by the previous volume bound we have N = Ou(mes(i7)). But by maximality 
we see that the cubes xi + [—l,l] D , . . . ,xn + [—l,l] D must cover {I, and the claim 
follows. □ 



Appendix D. Counting points of varieties over F p 

Let R be an arbitrary ring, and let Pi, ... ,Pj G ii[xi, . . . , Xd] be polynomials. Our 
interest here is to control the "density" of the (affine) algebraic variety 

{(xi,...,x D ) e R D : Pj(xi,...,x D ) = for all 1 < j < J} 

and more precisely to estimate quantities such as 

(105) Ea;ieAi,...,a; J jeA D Y\_ ^Pj(xi,-.,x o )=0 

ie[J] 

for certain finite non-empty subsets At,. . . , An C R (typically the Ai will either 
be all of R, or some arithmetic progression). We are particularly interested in the 
case when R is the finite field F p , but in order to also encompass the case of the 
integers Z (and of polynomial rings over F p or Z), we shall start by working in the 
more general context of a unique factorization domain. 

Of course, the proper way to do this would be to use the tools of modern algebraic 
geometry, for instance using the concepts of generic point and algebraic dimension 
of varieties. Indeed, the results in this appendix are "morally trivial" if one uses the 
fact that the codimcnsion of an algebraic variety is preserved under restriction to 
generic subspaces. However, to keep the exposition simple we have chosen a very 
classical, pedestrian and elementary approach, to emphasise that the facts from 
algebraic geometry which we will need are not very advanced. 

From the factor theorem (which is valid over any ring) we have 

Lemma D.l (Generic points of a one-dimensional polynomial). Let P E R[x] be 
a polynomial of one variable of degree at most d over a ring R. If P ^ 0, then 
P(x) 7^ for all but at most d values of x G R. 

As a corollary, we obtain 

Corollary D.2 (Generic points of a multi-dimensional polynomial). Let P G 
i?[xi, . . . ,X£>] be a polynomial of D variables of degree at most d over a ring 
R. If P ^ 0, then P(-,Xd) for all but at most d values of xd G R, where 
P(-, xd) G .F[xi, . . . , X£>_i] is the polynomial of D — 1 variables formed from P by 
replacing x^ by xd- 
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Proof. View P as a one-dimensional polynomial of xd with coefficients in the ring 
i?[xi, . . . ,Xfl-i] (which contains R), and apply Lcmma lD.fl □ 

As a consequence, we obtain a "baby combinatorial Nullstellensatz" (cf. pQ): 

Lemma D.3 (Baby Nullstellensatz). Let P G i?[xi, . . . , X£>] fee o polynomial of D 
variables of degree at most d over a ring R. Let Ai, . . . , Ad be finite subsets of R 
with \At\,. .., \A D \ > M for some M > 0. If P ^ 0, then 

Dd 1 

^x 1 eA 1 ,...,x D eA D ^P(x 1 ,...,x D )=Q < -J-£ ■ 

Proof. We induct on D. The case D = is vacuous. Now suppose D > 1 and the 
claim has already been proven for D — 1 . By Corollarv lD.2l we have P(-, xd) ^ for 
all but at most d values of xd G Ad- The exceptional values of xd can contribute 
at most -jjy , while the remaining values of xrj will contribute at most ^ D ^p d by the 
induction hypothesis. This completes the induction. □ 

This gives us a reasonable upper bound on the quantity Ijf 05JI in the case J = 1 , 
which then trivially implies the same bound for J > f. However, we expect to 
do better than type bounds for higher J when the polynomials P%, . . . , Pj are 
jointly coprime. To exploit the property of being coprime we recall the classical 
resultant of two polynomials. 

Definition D.4 (Resultant). Let R be a ring, let d, d' > 1, and let 
P = a + aix + . . . + a d x d ; Q = b + h + . . . + b d ,yL d ' 

be two polynomials in i?[x] of degree at most d, d' respectively. Then the resultant 
Resd,d' (P, Q) & R is defined to be the determinant of the (d + d') x (d + d') matrix 
whose rows are the coefficients in R of the polynomials P, xP, . . . , x d _1 P, Q, xQ, .... x d_1 P 
with respect to the basis 1, x, . . . , x. d+d _1 . 

More generally, if D > 1, 1 < i < D, di, > 1, and P, Q are two polynomials in 
P[xi, . . . , x_d] with deg x . (P) < do and deg x . (Q) < d' i: then we define the resultant 
R-es dD!( ^, , XD (P, Q) € P[xi, . . . ,Xj_i,x i+ i, . . . ,xd] by viewing P and Q as one- 
dimensional polynomials of x^ over the ring P[xi, . . . , Xj_i, Xj+i, . . . , X£>] and using 
the one-dimensional resultant defined earlier. 

Example D.5. If d = d' = 1, then the resultant of a + 6x and c + dx is ad — 6c, 
and the resultant of a(xi) + 6(xi)x2 and c(xi) + <i(xi)x2 in the X2 variable is 
a(xi)d(xi) - 6(xi)c(xi). 

Let P,Q G P[x] have degrees d,d' respectively for some d,d' > 1, where R is a 
unique factorization domain. By the determinant into a matrix and its adjugate, 
we obtain an identity 

(106) Res^, (P, Q)=AP + BQ 

for some polynomials A, B G P[x] of degree at most d' — 1 and d — 1 respectively. 
Thus if P, Q are irreducible and coprime, then the resultant cannot vanish by unique 
factorization. The same extends to higher dimensions: 
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Lemma D.6. Let R be a unique factorization domain, let 1 < i < D, and suppose 
that P, Q £ R[xi, . . . , x_d] are such that deg x . (P) = di > 1 and deg x . (Q) = d\ > 1. 
If P and Q are irreducible and coprime, then Res^d'.x; (P, Q) 7^ 0. 



Proof. View P, Q as one-dimensional polynomials over the unique factorization do- 
main i?[xi, . . . , Xj_i, Xj+i, . . . , Xfl] and apply the preceding argument. □ 

LemmaD.7 (Generic points of multiple polynomials). Let Pi, ...,Pj £ i?[xi, . . . , X£>] 
have degrees at most d over a unique factorization domain R, and suppose that all 
the P\, . . . , Pj are non-zero and jointly coprime. Then Pi(-, xd), . . ., P/(-, Xn) are 
non-zero and jointly coprime for all but at most Oj^l) values of xd £ R. 



Proof. By alignedting each of the Pj into factors we may assume that all the Pj are 
irreducible. By eliminating any two polynomials which are scalar multiples of each 
other we may then assume that the Pj are pairwise coprime. The claim is vacuous 
for k < 2, so it will suffice to verify the claim for k = 2. 

Suppose first that Pi is constant in xp. Then Pi{-,xd) = Pi is irreducible, and 
the only way it can fail to be coprime to P2(-,Xjj) is if P-2,{-,Xn) is a multiple of 
Pi. But we know that P2 itself is not a multiple of Pi; viewing Pi modulo Pi as 
a polynomial of degree at most d in xo over the ring P[xi, . . . , X£>_i]/ (Pi) we see 
from Lemma ID . 1 1 that the number of exceptional xd is at most d. 

A similar argument works if P2 is constant in xd. So now we may assume that 
deg XD (Pi) = di and deg XD (P2) = di for some di,c?2 > 1, which allows us to 
compute the resultant Res ( i 1 ^ 2 ,x D (Pi, P2) £ P[xi, . . . ,X£>_i]. By Lemma TP. 61 this 
resultant is non-zero; also by definition we see that the resultant has degree 0^(1). 

From ijlUtifl we see that if P\(-,Xd) and P2 (•,#£>) have any common factor in 
P[xi, . . . , X£>-i] (which we may assume to be irreducible), then this factor must 
also divide Hes^^^ (Pi, P2). From degree considerations we see that there are 
at most Od(l) such factors. Let Q £ P[xi, . . . ,xb_i] be one such possible factor. 
Since Pi and P2 are coprime, we know that Q cannot divide both Pi and P2; say 
it does not divide P2. Then by viewing P2 modulo Q as a polynomial of yo over 
R[xi, . . . , xd-i]/Q as before we see that there are at most d values of xd for which 
Q divides P2{-,Xd)- Putting this all together we obtain the claim. □ 



This gives us a variant of Lemma ID. 31 

Lemma D.8 (Second baby Nullstcllensatz). Let Pi,...,Pj £ P[x i5 . . . , x^] be 
polynomials of D variables of degree at most d over a unique factorization domain 
R. Let At,... , Ad be finite subsets of F with \A±\, . . . , \ Ad\ > M for some M > 0. 
If all the Pi , . . . , Pj are non-zero and jointly coprime, then 

~E Xl eA 1 ,...,x D eA D Y\_ ^P j (x 1 ....,x D )non-zero, jointly coprime = ^ ~ OD,dj(-j^) 

ie[J] 
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and 

^x-L&Ax,...,xneA D \\ ^P ] (x 1 ,...,x D )=0 <U,<i,J "Tp- 

Remark D.9. One could obtain sharper results by using Bezout's lemma, but the 
result here will suffice for our applications. 

Proof. The first claim follows by repeating the proof of Lemma ID. 31 (replacing 
Corollary ID . 21 by Lemma fD.7l and we leave it to the reader. To prove the second 
claim, we again induct on D. The base case D = is again trivial, so assume D > 1 
and the claim has already been proven for D — 1. 

By Lemma TP. 71 we know that for all but Oj,d(l) values of xd, that the polynomi- 
als Pi(-, xjj), . . . , Pj(-, X£>) are all non-zero and coprime. This case will contribute 
Od,<i,j{^i) by the induction hypothesis. Now consider one of the Oj^l) excep- 
tional values of xd- For each such xd, at least one of the polynomials Pj(-,xjj) 
has to be non-zero, otherwise x^> — xjj would be a common factor of Pi, ... , Pj, a 
contradiction. Applying Lemma fD. 31 we see that the contribution of each such xd 
is thus C>D,d,j( an( i the claim follows. □ 

We now specialize the above discussion to compute the local factors c p and 
defined in Definition 19. II We first observe the following easy upper bounds: 

Lemma D.10 (Crude local bound). Let Pi, . . . , Pj € Z[xi, . . . , x_d] have degree at 
most d, and let p be a prime. 

(i) // all the Pi, . . . , Pj vanish identically modulo p, then c p (Pi, . . . , Pj) = 1. 

(ii) If at least one of Pi, . . . , Pj vanish identically modulo p, then~c^(Pi, . . . , Pj) = 
0. 

(hi) If at least one of Pi, . . . , Pj is a non-zero constant modulo p, then c p {P\, . . . , Pj) — 
0. 

(iv) If at least one of Pi, . . . , Pj is non-constant modulo p, then c p (Pi, . . . , Pj) <C<j,_d 
l/p. 

(v) // the Pi, . . . , Pj are jointly coprime modulo p, then c p (Pi, . . . ,Pj) <tid d 
1/P 2 . 

Proof, (i) , (ii) , (iii) are trivial, while (iv) and (v) follow from Lemma ID.3I and 
Lemma lb . 81 respectively (setting A\ = . . . = Ad = R = F p ). □ 

Now we can refine the bound for a single polynomial P in the case when P is linear 
in one variable, with linear and constant coefficients coprime. 

Lemma D.ll (Linear case). Let P £ Z[xx, . . . ,X£>] have degree at most d, and let 
p be a prime. Suppose that P mod p is linear in the x.; variable for some 1 < i < d, 
thus we have 

P(xi, . . . ,x_d) = Pi(xi, . . . ,Xi_i,x i+ i, . . . ,x £) )x. i +P (xi, . . . ,Xj_i,x 4+ i, . . . ,x_d) modp 
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for some polynomials Pq, P\ £ F p [xi, . . . , x,*_ 1, x,-+i, . . . ,Xd]. Suppose also that the 
linear coefficient Pi is non-zero and coprime to the constant coefficient Pq. Then 
c p {P) = \/p + O d , D (\/p 2 ). 

Proof. Let us aligned F^ 1 = A (J B U C, where A is the subset of F® -1 where 
Pi 0, B is the subset of -FyP -1 where P\ = and P2 7^ 0, and C is the subset of 
F^ 1 where P\ = P2 = 0. Then an elementary counting argument shows that 

_ \A\ + \C\p 1 \B\ + \C\ \C\ 

W)- p D ~ p p D + p D-f 

Since Pi is not zero, we see from Lemma fD.lOf iv') that \B\ + \C\ <Cd p D ~ 2 ■ Since 
P\,P2 are coprime modulo p, we see from Lemma rD.lOf v^ that \C\ <Cd p D ~ 2 . The 
claim follows. □ 



We can now quickly prove Lemma 19.51 



of Lemma \9.5\ The claims (a), (b), (d) follow from Lemma lD.101 and Definition 
19.41 while (c) follows from Lemma TP . 1 1 1 and Definition ^). 41 The claim (e) is trivial, 
and the claim (f) follows from (a), (b) and fTTjl . □ 



Appendix E. The distribution of primes 



In this section we recall some classical results about the distribution of primes. 
For Re(s) > 1, define the Riemann zeta function 



n=l 



Our argument will be elementary enough that we will not need the meromorphic 
continuation of £ to the region Re(s) < 1. From the unique factorization of the 
natural numbers, we have the Euler product formula 

(107) coo = IB 1 - 

We also have the bounds 

(108) C(«) = ——r + 0(log(2 + |9f( a )|) and -i- = 0(log(2 + |9f(a)|)) 

s - 1 C(s) 

whenever 1 < Re(s) < 10 (see e.g. Chapter 3]). 
From the prime number theorem 

1 -I- nC\\) 

logx 



El = (1 + o(l))- as x — ► 00 
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(which, incidentally, can be deduced readily from H108J) ). and summation by parts, 
we easily obtain the estimates 

(109) logp = x + o(x) as x —> oo 

p<x 

(110) V - = loglogflO + x) + 0(1) for x > 
^— ' p 

p<x * 

(111) V ]f!I_J? < K log A '(10 + a;) for if > 0, x > 

( 112 ) £ ^ ^L-^ fOT K>0,x>l, Re( S ) > I- 
In a similar spirit we have whenever 1 < Rc(s) < 2 and x > 2 

i£ lo s(i-il«£i 

oo 1 

«£ £ 



n=0 2 n x<p<2 n + 1 x 

i 

< (Res - l)^ 8 " 1 logx' 
In particular, when Re(s) = 1 + 1/logi? and x = R logR we have 

E log(l-i) = o(l) 

and hence from (|107|l 

(113) TT (1 - — r 1 = (1 +o(l))C(s) whenever Re(s) = 1 + 1/logi?. 

p<ftl°gR 

We will frequently encounter expressions of the form exp(if^ pgP |), where P 
ranges over some set of primes (typically finite). Such sums can eventually be 
somewhat large, thanks to 11101) . Fortunately the very slow nature of the divergence 
of ^2 p - lets us estimate this exponential by a slowly divergent sum over primes, 
conceding only a few logarithms. 

Lemma E.l (Exponentials can be replaced by logarithms). Let P be any set of 

primes, and let K > 1. Then 

peP^ pep r 

or equivalently 

K 



v ^— ^ p 
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Remark E.2. Note that the sum is only over primes in J}, rather than products of 
primes in p 1 which would have been the case if we had written exp(K X] p gp p) = 
Ilpep exp(iif/p) = Ilpep(l + Ok0-/p))- The fact that we keep the modulus prime 
is useful for applications, as it allows us to work over fields F p rather than mere 
rings Zjv when performing certain local counting estimates. This lets us avoid 
certain technical issues involving zero divisors which would otherwise complicate 
the argument. The additional logarithmic powers of p are sometimes dangerous, 
but in several cases we will be able to acquire an additional factor of ^ from an 
averaging argument, which will make the summation on the right-hand side safely 
convergent regardless of how many logarithms are present, thanks to (|112|l . 



Proof. Let us fix K and suppress the dependence of the 0() notation on K. By a 
limiting argument we may take Y[ to be finite. We expand the left-hand side as a 
power series 

x Tf n 1 
1 + V — • 

By paying a factor of n we may assume that p n is greater than or equal to the other 
primes, thus bounding the previous expression by 

K n ^ v , 1 



-i (n — 1)1 ' ' p n . . .p„ 

=i y ' p„ePpi,...,p„_ien;3'i.-.Pn-i<Pn 

We rewrite p n as p and rearrange this as 

i + ylf Rn ( v i)"- 1 . 

^p^(n-l)! 1 ^ j/ j 

p£P ^ n=l v ' p'&l\-P'<P 

From (|110|l we have 

V i <l O glog(10 + p) + O(l) 
n' 

p'gI1 : p'<p 

and so we can bound the previous expression by 



v l^ K(^loglog(10 + p) + O(l))"- 1 
+ Z^ p Z^ (n -1)1 

p6P ^ n=l v ; 

Summing the power series we obtain the result. □ 



Finally, we record a very simple lemma, using the quantities w and W defined in 
Section IO 

Lemma E.3 (Divisor bound). Let Y[ be any collection of primes such that Y[ p eP P — 
MW M for some M > 0. Then 

£ ~ = om(1). 

p>w:p£'P 



so 
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Proof. We trivially bound ^ by ^, and observe that the number of primes in Y\ 
larger than w is at most \og(MW M ) / log(w) = om (log W) . But from H109|l we have 
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